Research Analyzer
← Back ICRA 2026

Embodiment‑Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control

Quanquan Peng, Yunfeng Lin, Yufei Xue, Jiangmiao Pang, Weinan Zhang

PDF

AI summary

Key figure (auto-extracted from paper)
A single unified policy can control multiple heterogeneous humanoids with rich whole-body commands without per-robot reward tuning, outperforming baselines in accuracy and robustness.
Humanoid control cross-embodiment learning policy distillation reinforcement learning whole-body control zero-shot transfer

Problem

Training a single reinforcement learning policy to transfer across diverse humanoid robots is hindered by differences in dynamics, degrees of freedom, and kinematics, while existing methods lack support for rich whole-body behaviors and real-world validation.

Approach

EAGLE employs an iterative generalist-specialist distillation loop where embodiment-specific specialists are forked, fine-tuned on individual robots, and their skills are distilled back into a shared generalist using a unified high-dimensional command interface and embodiment-aware observations.

Key results

  • High command-tracking accuracy and robustness across 5 simulated and 4 real-world humanoids
  • Eliminates per-robot reward tuning and network redesign requirements
  • Iterative distillation loop steadily improves generalist and specialist performance until convergence
  • Embodiment-aware observations and representation alignment significantly boost cross-embodiment generalization

Why it matters

Enables scalable, fleet-level humanoid control, accelerating real-world deployment and reducing the engineering burden of training separate controllers for each robot model.

Abstract

Humanoid Whole-Body Controllers trained with reinforcement learning (RL) have recently achieved remark- able performance, yet many target a single robot embodiment. Variations in dynamics, degrees of freedom (DoFs), and kine- matic topology still hinder a single policy from commanding diverse humanoids. Moreover, obtaining a generalist policy that not only transfers across embodiments but also supports richer behaviors—beyond simple walking to squatting, leaning— remains especially challenging. In this work, we tackle these ob- stacles by introducing EAGLE, an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple heterogeneous humanoids without per-robot re- ward tuning. During each cycle, embodiment-specific specialists are forked from the current generalist, refined on their respective robots, and new skills are distilled back into the generalist by training on the pooled embodiment set. Repeating this loop until performance convergence produces a robust Whole-Body Controller validated on robots such as Unitree H1, G1, and Fourier N1. We conducted experiments on five different robots in simulation and four in real-world settings. Through quanti- tative evaluations, EAGLE achieves high tracking accuracy and robustness compared to other methods, marking a step toward scalable, fleet-level humanoid control.

Index terms

Humanoid and Bipedal Locomotion Legged Robots Reinforcement Learning

Related papers