← Back ICRA 2026

MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies

Chengbo Yuan, Rui Zhou, Mengzhen Liu, Yingdong Hu, Shengjie Wang, Li Yi, Chuan Wen, Shanghang Zhang, Yang Gao

PDF

AI summary

Key figure (auto-extracted from paper)

Human VR demonstrations can be directly transformed and cotrained with robot data to enable end-to-end policies to perform new manipulation tasks on real robots without task-specific robot data.

Human-robot transfer VR motion data End-to-end policies Motion-level learning Multi-task cotraining Imitation learning

Problem

Collecting large-scale real-robot manipulation data is costly and labor-intensive, while it remains unclear whether abundant human VR data can directly transfer actionable motion knowledge to robot policies without intermediate representations.

Approach

MotionTrans maps VR-tracked human poses into a robot-compatible observation-action space and jointly trains end-to-end policies on balanced human and robot datasets through a weighted multi-task cotraining strategy.

Key results

9 out of 15 human tasks achieve non-trivial zero-shot success rates when directly deployed on a robot
Policies learn meaningful task-directed motions even for unsuccessful tasks (average progress score ~0.5)
Joint cotraining with robot data is essential, as human-only training yields 0% zero-shot success
Few-shot finetuning with a small number of robot demonstrations boosts average success rates by 40%

Why it matters

It proves that human VR data can serve as a scalable, direct source of motion priors for robotics, significantly reducing the need for expensive real-robot data collection.

Abstract

Scaling real robot data is a key bottleneck in imitation learning, leading to the use of auxiliary data for policy training. While other aspects of robotic manipulation such as image or language understanding may be learned from internet-based datasets, acquiring motion knowledge remains challenging. Human data, with its rich diversity of manipulation behaviors, offers a valuable resource for this purpose. While previous works show that using human data can bring benefits, such as improving robustness and training efficiency, it remains unclear whether it can realize its greatest advantage: enabling robot policies to directly learn new motions for task completion. In this paper, we systematically explore this potential through multi-task human-robot cotraining. We introduce MotionTrans, a framework that includes a data collection system, a hu- man data transformation pipeline, and a weighted cotraining strategy. By cotraining 30 human-robot tasks simultaneously, we direcly transfer motions of 13 tasks from human data to deployable end-to-end robot policies. Notably, 9 tasks achieve non-trivial success rates in zero-shot manner. MotionTrans also significantly enhances pretraining-finetuning performance (+40% success rate). These findings unlock the potential of motion-level learning from human data, offering insights into its effective use for training robotic manipulation policies. All data, code, and model weights will be open-sourced.

Index terms

Transfer Learning Learning from Demonstration Data Sets for Robot Learning