← Back ICRA 2026

Kungfubot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control

Jinrui Han, Weiji Xie, Jiakun Zheng, Jiyuan Shi, Weinan Zhang, Ting Xiao, Chenjia Bai

PDF

AI summary

Key figure (auto-extracted from paper)

A single universal policy can accurately and stably imitate diverse, long-horizon humanoid motions by combining orthogonal expert routing with hybrid tracking objectives.

Humanoid control motion imitation mixture-of-experts sim-to-real whole-body tracking reinforcement learning

Problem

Learning a single policy for diverse humanoid motions is hindered by limited network expressiveness and the difficulty of balancing local motion fidelity with global trajectory stability over long sequences.

Approach

VMS employs an Orthogonal Mixture-of-Experts architecture to disentangle skill representations, guided by a hybrid tracking objective and segment-level reward that relaxes rigid step-wise matching for robust long-horizon execution.

Key results

Orthogonal expert routing disentangles skill representations, improving expressiveness and generalization.
Hybrid tracking with segment-level rewards minimizes long-horizon drift and stabilizes minute-long sequences.
Outperforms baseline methods in tracking accuracy and success rates across diverse simulated motions.
Successfully deployed on a real Unitree G1 robot for robust imitation of dynamic and complex skills.

Why it matters

Establishes a scalable foundation for general-purpose humanoid robots to reliably execute diverse, human-like behaviors in real-world environments.

Abstract

Learning versatile whole-body skills by tracking various human motions is a fundamental step toward general- purpose humanoid robots. This task is particularly challeng- ing because a single policy must master a broad repertoire of motion skills while ensuring stability over long-horizon sequences. To this end, we present VMS, a unified whole- body controller that enables humanoid robots to learn diverse and dynamic behaviors within a single policy. Our framework integrates a hybrid tracking objective that balances local motion fidelity with global trajectory consistency, and an Orthogonal Mixture-of-Experts (OMoE) architecture that encourages skill specialization while enhancing generalization across motions. A segment-level tracking reward is further introduced to relax rigid step-wise matching, enhancing robustness when handling global displacements and transient inaccuracies. We validate VMS extensively in both simulation and real-world experiments, demonstrating accurate imitation of dynamic †Corresponding Author 1Institute of Artificial Intelligence (TeleAI), China Telecom, 2Shanghai Jiao Tong University, 3East China University of Science and Technology skills, stable performance over minute-long sequences, and strong generalization to unseen motions. These results highlight the potential of VMS as a scalable foundation for versatile humanoid whole-body control. The project page is available at kungfubot2-humanoid.github.io.

Index terms

Humanoid and Bipedal Locomotion Whole-Body Motion Planning and Control Reinforcement Learning