Learn to Teach: Sample-Efficient Privileged Learning for Humanoid Locomotion Over Real-World Uneven Terrain
Feiyang Wu, Xavier Nal, Jaehwi Jang, Wei Zhu, Zhaoyuan Gu, Anqi Wu, Ye Zhao
AI summary
Problem
Conventional teacher-student reinforcement learning for robotics relies on a decoupled two-stage process that discards valuable teacher interaction data and suffers from an imitation gap, resulting in high sample complexity and unreliable sim-to-real transfer.
Approach
The Learn-to-Teach (L2T) framework co-trains teacher and student policies in a single interactive stage, using a dynamic sample-mixing strategy to recycle simulator data and bridge the teacher-student observation gap.
Key results
- 50% reduction in training samples compared to conventional teacher-student paradigms
- Zero-shot sim-to-real transfer on the physical Digit humanoid robot
- Robust locomotion across 12+ diverse real-world terrains without depth estimation modules
- Effective mitigation of the teacher-student imitation gap via dynamic sample mixing
Why it matters
It significantly lowers the computational and data barriers for deploying robust, sim-trained robotic policies in the real world, accelerating the practical adoption of humanoid robots in unstructured environments.
Abstract
Humanoidrobotspromisetransformativecapabilities for industrial and service applications. While recent advances in Reinforcement Learning (RL) yield impressive results in locomo- tion, manipulation, and navigation, the proposed methods typi- cally require enormous simulation samples to account for real- world variability. This work proposes a novel one-stage training framework—Learn to Teach (L2T)—which unifies teacher and stu- dent policy learning. Our approach recycles simulator samples and synchronizes the learning trajectories through shared dynamics, significantly reducing sample complexities and training time while achieving state-of-the-art performance. Furthermore, we validate the RL variant (L2T-RL) through extensive simulations and hard- ware tests on the Digit robot, demonstrating zero-shot sim-to-real transfer and robust performance over 12+ diverse terrains without depth estimation modules.