Embodiment‑Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control
Quanquan Peng, Yunfeng Lin, Yufei Xue, Jiangmiao Pang, Weinan Zhang
AI summary
Problem
Training a single reinforcement learning policy to transfer across diverse humanoid robots is hindered by differences in dynamics, degrees of freedom, and kinematics, while existing methods lack support for rich whole-body behaviors and real-world validation.
Approach
EAGLE employs an iterative generalist-specialist distillation loop where embodiment-specific specialists are forked, fine-tuned on individual robots, and their skills are distilled back into a shared generalist using a unified high-dimensional command interface and embodiment-aware observations.
Key results
- High command-tracking accuracy and robustness across 5 simulated and 4 real-world humanoids
- Eliminates per-robot reward tuning and network redesign requirements
- Iterative distillation loop steadily improves generalist and specialist performance until convergence
- Embodiment-aware observations and representation alignment significantly boost cross-embodiment generalization
Why it matters
Enables scalable, fleet-level humanoid control, accelerating real-world deployment and reducing the engineering burden of training separate controllers for each robot model.
Abstract
Humanoid Whole-Body Controllers trained with reinforcement learning (RL) have recently achieved remark- able performance, yet many target a single robot embodiment. Variations in dynamics, degrees of freedom (DoFs), and kine- matic topology still hinder a single policy from commanding diverse humanoids. Moreover, obtaining a generalist policy that not only transfers across embodiments but also supports richer behaviors—beyond simple walking to squatting, leaning— remains especially challenging. In this work, we tackle these ob- stacles by introducing EAGLE, an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple heterogeneous humanoids without per-robot re- ward tuning. During each cycle, embodiment-specific specialists are forked from the current generalist, refined on their respective robots, and new skills are distilled back into the generalist by training on the pooled embodiment set. Repeating this loop until performance convergence produces a robust Whole-Body Controller validated on robots such as Unitree H1, G1, and Fourier N1. We conducted experiments on five different robots in simulation and four in real-world settings. Through quanti- tative evaluations, EAGLE achieves high tracking accuracy and robustness compared to other methods, marking a step toward scalable, fleet-level humanoid control.