← Back ICRA 2026

Learn to Teach: Sample-Efficient Privileged Learning for Humanoid Locomotion Over Real-World Uneven Terrain

Feiyang Wu, Xavier Nal, Jaehwi Jang, Wei Zhu, Zhaoyuan Gu, Anqi Wu, Ye Zhao

PDF

AI summary

Key figure (auto-extracted from paper)

A unified one-stage teacher-student training framework that recycles simulator samples and synchronizes learning trajectories, cutting sample complexity by 50% while enabling zero-shot sim-to-real humanoid locomotion across diverse real-world terrains.

Reinforcement learning Sim-to-real transfer Teacher-student learning Humanoid locomotion Sample efficiency Privileged information

Problem

Conventional teacher-student reinforcement learning for robotics relies on a decoupled two-stage process that discards valuable teacher interaction data and suffers from an imitation gap, resulting in high sample complexity and unreliable sim-to-real transfer.

Approach

The Learn-to-Teach (L2T) framework co-trains teacher and student policies in a single interactive stage, using a dynamic sample-mixing strategy to recycle simulator data and bridge the teacher-student observation gap.

Key results

50% reduction in training samples compared to conventional teacher-student paradigms
Zero-shot sim-to-real transfer on the physical Digit humanoid robot
Robust locomotion across 12+ diverse real-world terrains without depth estimation modules
Effective mitigation of the teacher-student imitation gap via dynamic sample mixing

Why it matters

It significantly lowers the computational and data barriers for deploying robust, sim-trained robotic policies in the real world, accelerating the practical adoption of humanoid robots in unstructured environments.

Abstract

Humanoidrobotspromisetransformativecapabilities for industrial and service applications. While recent advances in Reinforcement Learning (RL) yield impressive results in locomo- tion, manipulation, and navigation, the proposed methods typi- cally require enormous simulation samples to account for real- world variability. This work proposes a novel one-stage training framework—Learn to Teach (L2T)—which unifies teacher and stu- dent policy learning. Our approach recycles simulator samples and synchronizes the learning trajectories through shared dynamics, significantly reducing sample complexities and training time while achieving state-of-the-art performance. Furthermore, we validate the RL variant (L2T-RL) through extensive simulations and hard- ware tests on the Digit robot, demonstrating zero-shot sim-to-real transfer and robust performance over 12+ diverse terrains without depth estimation modules.

Index terms

Humanoid and Bipedal Locomotion Reinforcement Learning Legged Robots