Cross-Embodiment Transfer Via Behavior-Aligned Representations
Ajay Sridhar, Jensen Gao, Jonathan Yang, Jean Mercat, Suneel Belkhale, Dorsa Sadigh
AI summary
Problem
Cross-embodiment transfer remains difficult due to mismatched observations and action spaces across robot platforms, hindering the use of large-scale heterogeneous datasets.
Approach
The method trains vision-language-action models to implicitly align diverse robot data by jointly predicting behavior-aligned representations alongside actions, evaluated on a new simulation benchmark and real robots.
Key results
- End-effector traces yield the strongest transfer gains
- Representation benefits scale with larger prior datasets
- Inference-time representation prediction is unnecessary
- Sim-to-real transfer improves task progress by 28%
Why it matters
Provides a scalable pathway for generalist robot policies to leverage heterogeneous data across evolving hardware platforms.
Abstract
Recent progress in large-scale imitation learning for robot manipulation has been driven by leveraging datasets across a wide range of robot embodiments. However, achieving significant cross-embodiment transfer is often still challenging. In this work, we study the role of using behavior-aligned representations (e.g., object bounding boxes, language motions, end-effector traces of robot motion) in vision-language-action (VLA) models to promote cross-embodiment transfer. We hy- pothesize that by possessing invariances across embodiments while being predictive of robot actions, these representations can help unify large-scale cross-embodiment data to enhance transfer. To assess our hypothesis, we develop a simulation- based benchmark designed to assess transfer with diverse cross- embodiment data to new embodiments. Using this benchmark, we compare different representations and ways of incorporating them. We identify that end-effector traces can be particularly beneficial for transfer, representations are generally more useful with larger prior datasets, and can be used to benefit from action-free data. We also demonstrate that they can enhance sim-to-real cross-embodiment transfer, improving task comple- tion progress of real robot policies pre-trained on simulation data by 28%. We provide videos of our evaluations at our website https://ajaysridhar.com/barx/.