← Back ICRA 2026

Teaching to Individual Needs: Bidirectional Teacher-Student Learning for Wheeled-Legged Locomotion

Guangsheng Li, Charles Wu, XinHua Zheng, shiyu zhu, Shenglan Liu

PDF

AI summary

Key figure (auto-extracted from paper)

The proposed bidirectional Teacher-Student framework significantly improves training efficiency and real-world traversability for wheeled-legged robots by resolving action distribution mismatches and enhancing imitation feasibility.

Wheeled-legged locomotion Teacher-Student learning Reinforcement learning Sim-to-real transfer Mixture density network Imitation learning

Problem

Applying the conventional Teacher-Student reinforcement learning paradigm to wheeled-legged robots suffers from multimodal confusion, which causes the student to average out diverse action modes, and low imitability, where the teacher generates actions the student cannot reliably reproduce.

Approach

The authors propose a bidirectional Teacher-Student framework that uses a mixture density network to explicitly model and select the student's dominant action modes, alongside an imitation-aware reward that guides the teacher to generate actions the student can reliably reproduce.

Key results

Explicit modeling of multimodal action distributions via HWC-MDN to prevent behavioral averaging
Imitation-Aware Reward using Mahalanobis distance to guide teacher action reproducibility
Significantly improved training efficiency and traversability in simulation
Real-world MagicDog-W navigation of 45 cm obstacles and 45° slopes using only proprioception

Why it matters

It enables robust sim-to-real locomotion for complex wheeled-legged robots in unstructured environments, advancing practical deployment for search-and-rescue and inspection tasks.

Abstract

Reinforcement Learning (RL) enables robust and adaptive locomotion in legged and wheeled-legged robots. A common approach is the Teacher-Student (TS) paradigm, in which a teacher policy with privileged information supervises a proprioceptive student. While the TS paradigm has proven effective on legged robots, we encounter two critical issues when applying it to wheeled-legged robots. One issue is multimodal confusion, where teacher actions become multimodal under the student proprioceptive observations, resulting in the student generating averaged action modes. The other is low imitability of teacher actions, as the teacher overlooks their reproducibility by the student. To address these issues, we propose Teaching to Individual Needs (TIN), a bidirectional TS framework. To mit- igate multimodal confusion within the student policy, we design a Highest-Weight Component Mixture Density Network (HWC- MDN). By utilizing HWC-MDN, TIN student can explicitly model multimodal action distributions and outputs the highest- weight component. To improve imitability, we propose an Imitation-Aware Reward (IAR) that encourages the teacher to generate more reproducible actions by the student. Simulation experiments show that TIN significantly improves both training efficiency and traversability. Real-world tests illustrate that TIN enables the wheeled-legged robot MagicDog-W to traverse 45 cm obstacles and ascend 45◦slopes.

Index terms

Legged Robots Reinforcement Learning Imitation Learning