← Back ICRA 2026

CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots

Shihao Ma, Hongjin Chen, Zijun Xu, Yi Zhao, Ke Wu, Ruichen Yang, Leyao Zou, Zhongxue Gan, Wenchao Ding

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating contrastive learning into a Mixture of Experts framework enables humanoid robots to dynamically specialize and adapt to complex, mixed terrains in a single training stage.

Humanoid locomotion Mixture of Experts Contrastive learning Reinforcement learning Terrain adaptation Robotics

Problem

Vanilla Mixture of Experts models suffer from 'lazy gating,' where expert activations remain nearly uniform across different terrains, preventing effective terrain specialization and limiting adaptability in complex, heterogeneous environments.

Approach

The authors propose CMoE, a single-stage reinforcement learning framework that combines a Mixture of Experts policy with a contrastive learning objective to align expert activation distributions with terrain-specific features, encouraging clear specialization.

Key results

Achieves state-of-the-art success rates and travel distances across 8 diverse terrains in simulation.
Enables traversal of 20 cm continuous steps and 80 cm gaps on a physical Unitree G1 robot.
Establishes clear, terrain-specific clustering of expert activations via contrastive learning.
Provides a publicly released codebase for community adoption.

Why it matters

It provides a scalable, single-stage training paradigm that significantly improves real-world humanoid locomotion on complex, heterogeneous terrains, benefiting robotics researchers and developers.

Abstract

For effective deployment in real-world environ- ments, humanoid robots must autonomously navigate a diverse range of complex terrains with abrupt transitions. While the Vanilla mixture of experts (MoE) framework is theoretically ca- pable of modeling diverse terrain features, in practice, the gat- ing network exhibits nearly uniform expert activations across different terrains, weakening the expert specialization and limiting the model’s expressive power. To address this limitation, we introduce CMoE, a novel single-stage reinforcement learning framework that integrates contrastive learning to refine expert activation distributions. By imposing contrastive constraints, CMoE maximizes the consistency of expert activations within the same terrain while minimizing their similarity across 1College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China, 200433 This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62403142, in part by the Science and Technology Commission of Shanghai Municipality under Grant 24511103100, and in part by the Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103). ∗Corresponding authors: Wenchao Ding and Zhongxue Gan. Project Page: https://hoshi-no-ai.github.io/CMoE different terrains, thereby encouraging experts to specialize in distinct terrain types. We validated our approach on the Unitree G1 humanoid robot through a series of challenging experiments. Results demonstrate that CMoE enables the robot to traverse continuous steps up to 20 cm high and gaps up to 80 cm wide, while achieving robust and natural gait across diverse mixed terrains, surpassing the limits of existing methods. To support further research and foster community development, we will release our code publicly.

Index terms

Humanoid and Bipedal Locomotion Reinforcement Learning Legged Robots