Research Analyzer
← Back ICRA 2026

Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots

Boyu Li, Siyuan He, Hang Xu, Haoqi Yuan, Xinrun Xu, Yu Zang, Liwei Hu, ZhenXiong Jiang, Junpeng Yue, Pengbo Hu, Börje F. Karlsson, Dongbin Zhao, Yehui Tang, Zongqing Lu

PDF

AI summary

Key figure (auto-extracted from paper)
Incorporating proprioceptive information into multimodal large language models significantly improves long-horizon planning and spatial reasoning for dual-arm humanoid robots.
Dual-arm humanoid Embodied AI Proprioception Multimodal LLMs Simulation benchmark Long-horizon planning

Problem

Current multimodal large language models lack embodiment awareness for dual-arm humanoid planning, and existing simulation benchmarks fail to support physically realistic, long-horizon task evaluation and data collection.

Approach

The authors introduce DualTHOR, a physically realistic dual-arm humanoid simulator, and Proprio-MLLM, a model that grounds multimodal language planning in the robot's proprioceptive states using motion embeddings and spatial encoders.

Key results

  • DualTHOR simulator with continuous transitions and stochastic contingency mechanisms
  • Proprio-MLLM architecture integrating proprioceptive data via motion embeddings and cross-spatial encoders
  • 19.75% average improvement in planning success rate over existing MLLMs
  • Open-source benchmark and dataset for long-horizon dual-arm embodied planning

Why it matters

It provides a crucial simulation benchmark and a physically grounded planning framework to advance the development of robust, real-world capable dual-arm humanoid robots.

Abstract

In recent years, Multimodal Large Language Models (MLLMs) have demonstrated the ability to serve as high-level planners, enabling robots to follow complex human instructions. However, their effectiveness, especially in long-horizon tasks involving dual-arm humanoid robots, remains limited. This limitation arises from two main challenges: (i) the absence of simulation platforms that systematically support task evaluation and data collection for humanoid robots, and (ii) the insufficient embodiment awareness of current MLLMs, which hinders reasoning about dual-arm selection logic and body positions during planning. To address these issues, we present DualTHOR, a new dual-arm humanoid simulator, with continuous transition and a contingency mechanism. Building on this platform, we propose Proprio-MLLM, a model that enhances embodiment awareness by incorporating proprio- ceptive information with motion-based position embedding and a cross-spatial encoder. Experiments show that, while existing MLLMs struggle in this environment, Proprio- MLLM achieves an average improvement of 19.75% in planning performance. Our work provides both an essential simulation platform and an effective model to advance embodied intelligence in humanoid robotics. The code is available at https://anonymous.4open.science/r/ DualTHOR-5F3B/.

Index terms

Integrated Planning and Learning Simulation and Animation

Related papers