TrajBooster: Boosting Humanoid Whole-Body Manipulation Via Trajectory-Centric Learning
Jiacheng Liu, Pengxiang Ding, Qihang Zhou, Yuxuan Wu, Da Huang, Zimian Peng, Wei Xiao, Weinan Zhang, Lixin Yang, Cewu Lu, Donglin Wang
AI summary
Problem
Bipedal humanoid VLA models struggle to align with new action spaces due to scarce, high-quality demonstration data, particularly for wide-range, whole-body manipulation beyond tabletop tasks.
Approach
TrajBooster extracts 6D end-effector trajectories from wheeled humanoid datasets, retargets them in simulation to a bipedal robot using a hierarchical controller, and post-pre-trains a VLA with this synthetic data before fine-tuning on minimal real teleoperation data.
Key results
- First real-world VLA deployment for bipedal whole-body manipulation using cross-embodiment data
- Hierarchical retargeting model successfully maps wheeled trajectories to bipedal whole-body actions
- Post-pre-training accelerates VLA adaptation, requiring only 10 minutes of real teleoperation
- Achieves robust beyond-tabletop tasks like squatting and cross-height manipulation on Unitree G1
Why it matters
Provides a scalable, data-efficient pipeline for training bipedal humanoid manipulators, reducing reliance on costly same-embodiment teleoperation and advancing real-world humanoid robotics.
Abstract
Recent Vision-Language-Action (VLA) models show potential to generalize across embodiments but struggle to quickly align with a new robot’s action space when high-quality demonstrations are scarce, especially for bipedal humanoids. We present TrajBooster, a cross-embodiment framework that lever- ages abundant wheeled-humanoid data to boost bipedal VLA. Our key idea is to use end-effector trajectories as a morphology- agnostic interface. TrajBooster (i) extracts 6D dual-arm end- effector trajectories from real-world wheeled humanoids, (ii) retargets them in simulation to Unitree G1 with a whole- body controller trained via a heuristic-enhanced harmonized online DAgger to lift low-dimensional trajectory references into feasible high-dimensional whole-body actions, and (iii) forms heterogeneous triplets that couple source vision/language with target humanoid-compatible actions to post-pre-train a VLA, followed by only 10 minutes of teleoperation data collection on the target humanoid domain. Deployed on Unitree G1, our policy achieves beyond-tabletop household tasks, enabling squatting, cross-height manipulation, and coordinated whole-body motion with markedly improved robustness and generalization. Results show that TrajBooster allows existing wheeled-humanoid data to efficiently strengthen bipedal humanoid VLA performance, reducing reliance on costly same-embodiment data while en- hancing action space understanding and zero-shot skill transfer capabilities. For more details, please refer to our webpage https://jiachengliu3.github.io/TrajBooster. ∗Equal contribution. † Equal advising. 1Zhejiang University, 2Westlake University, 3Shanghai Jiao Tong Univer- sity, 4Shanghai Innovation Institute.