← Back ICRA 2026

COMPASS: Cross-embOdiment Mobility Policy Via ResiduAl RL and Skill Synthesis

Wei Liu, Huihua Zhao, Chenran Li, Yuchen Deng, Joydeep Biswas, Yan Chang, Soha Pouya

PDF

AI summary

Key figure (auto-extracted from paper)

COMPASS enables scalable cross-embodiment mobility by fine-tuning a single-robot imitation learning policy with residual reinforcement learning and distilling it into a unified generalist policy, achieving a 5X higher success rate than the baseline.

cross-embodiment mobility residual reinforcement learning imitation learning policy distillation robot generalization sim-to-real transfer

Problem

Classical mobility stacks require extensive per-robot tuning, while learning-based approaches like imitation learning demand prohibitively large amounts of high-quality demonstrations for each new robot embodiment, hindering scalability across diverse morphologies.

Approach

The framework pre-trains a base mobility policy on a single robot using imitation learning, efficiently adapts it to new embodiments via residual reinforcement learning, and distills embodiment-specific specialists into a single generalist policy conditioned on an embodiment embedding.

Key results

Achieves ~5X higher success rate and 3X travel efficiency over IL baseline
Generalizes across wheeled, quadruped, and humanoid platforms in complex environments
Enables zero-shot sim-to-real transfer
Requires expert demonstrations from only a single embodiment

Why it matters

It provides a scalable, data-efficient pipeline for deploying robust mobility policies across heterogeneous robot platforms without costly per-robot data collection or manual retuning.

Abstract

As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment. To address these challenges, we introduce COMPASS, a uni- fied framework that enables scalable cross-embodiment mobil- ity using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy, and further demonstrates zero- shot sim-to-real transfer. Project page: https://nvlabs.github.io/COMPASS

Index terms

Reinforcement Learning Vision-Based Navigation