COMPASS: Cross-embOdiment Mobility Policy Via ResiduAl RL and Skill Synthesis
Wei Liu, Huihua Zhao, Chenran Li, Yuchen Deng, Joydeep Biswas, Yan Chang, Soha Pouya
AI summary
Problem
Classical mobility stacks require extensive per-robot tuning, while learning-based approaches like imitation learning demand prohibitively large amounts of high-quality demonstrations for each new robot embodiment, hindering scalability across diverse morphologies.
Approach
The framework pre-trains a base mobility policy on a single robot using imitation learning, efficiently adapts it to new embodiments via residual reinforcement learning, and distills embodiment-specific specialists into a single generalist policy conditioned on an embodiment embedding.
Key results
- Achieves ~5X higher success rate and 3X travel efficiency over IL baseline
- Generalizes across wheeled, quadruped, and humanoid platforms in complex environments
- Enables zero-shot sim-to-real transfer
- Requires expert demonstrations from only a single embodiment
Why it matters
It provides a scalable, data-efficient pipeline for deploying robust mobility policies across heterogeneous robot platforms without costly per-robot data collection or manual retuning.
Abstract
As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment. To address these challenges, we introduce COMPASS, a uni- fied framework that enables scalable cross-embodiment mobil- ity using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy, and further demonstrates zero- shot sim-to-real transfer. Project page: https://nvlabs.github.io/COMPASS