← Back ICRA 2026

TrajBooster: Boosting Humanoid Whole-Body Manipulation Via Trajectory-Centric Learning

Jiacheng Liu, Pengxiang Ding, Qihang Zhou, Yuxuan Wu, Da Huang, Zimian Peng, Wei Xiao, Weinan Zhang, Lixin Yang, Cewu Lu, Donglin Wang

PDF

AI summary

Key figure (auto-extracted from paper)

Cross-embodiment trajectory retargeting enables a VLA to master bipedal whole-body manipulation with only 10 minutes of real-world data.

Cross-embodiment learning Vision-language-action models Whole-body manipulation Trajectory retargeting Data-efficient fine-tuning Humanoid robotics

Problem

Bipedal humanoid VLA models struggle to align with new action spaces due to scarce, high-quality demonstration data, particularly for wide-range, whole-body manipulation beyond tabletop tasks.

Approach

TrajBooster extracts 6D end-effector trajectories from wheeled humanoid datasets, retargets them in simulation to a bipedal robot using a hierarchical controller, and post-pre-trains a VLA with this synthetic data before fine-tuning on minimal real teleoperation data.

Key results

First real-world VLA deployment for bipedal whole-body manipulation using cross-embodiment data
Hierarchical retargeting model successfully maps wheeled trajectories to bipedal whole-body actions
Post-pre-training accelerates VLA adaptation, requiring only 10 minutes of real teleoperation
Achieves robust beyond-tabletop tasks like squatting and cross-height manipulation on Unitree G1

Why it matters

Provides a scalable, data-efficient pipeline for training bipedal humanoid manipulators, reducing reliance on costly same-embodiment teleoperation and advancing real-world humanoid robotics.

Abstract

Recent Vision-Language-Action (VLA) models show potential to generalize across embodiments but struggle to quickly align with a new robot’s action space when high-quality demonstrations are scarce, especially for bipedal humanoids. We present TrajBooster, a cross-embodiment framework that lever- ages abundant wheeled-humanoid data to boost bipedal VLA. Our key idea is to use end-effector trajectories as a morphology- agnostic interface. TrajBooster (i) extracts 6D dual-arm end- effector trajectories from real-world wheeled humanoids, (ii) retargets them in simulation to Unitree G1 with a whole- body controller trained via a heuristic-enhanced harmonized online DAgger to lift low-dimensional trajectory references into feasible high-dimensional whole-body actions, and (iii) forms heterogeneous triplets that couple source vision/language with target humanoid-compatible actions to post-pre-train a VLA, followed by only 10 minutes of teleoperation data collection on the target humanoid domain. Deployed on Unitree G1, our policy achieves beyond-tabletop household tasks, enabling squatting, cross-height manipulation, and coordinated whole-body motion with markedly improved robustness and generalization. Results show that TrajBooster allows existing wheeled-humanoid data to efficiently strengthen bipedal humanoid VLA performance, reducing reliance on costly same-embodiment data while en- hancing action space understanding and zero-shot skill transfer capabilities. For more details, please refer to our webpage https://jiachengliu3.github.io/TrajBooster. ∗Equal contribution. † Equal advising. 1Zhejiang University, 2Westlake University, 3Shanghai Jiao Tong Univer- sity, 4Shanghai Innovation Institute.

Index terms

Deep Learning Methods Whole-Body Motion Planning and Control Dual Arm Manipulation