← Back ICRA 2026

Mixture-Of-Experts Policy for Smooth and Stable Multi-Posture Fall Recovery in Bipedal Robot

Haomin Rong, Yuying Chen, Zhiyong Xu, Lijie Xie, Qingyu Yan, Hui Cheng

PDF

AI summary

Key figure (auto-extracted from paper)

A unified Mixture-of-Experts reinforcement learning policy enables bipedal robots to robustly and smoothly recover from diverse fall postures with zero-shot real-world deployment.

Mixture-of-Experts Fall Recovery Bipedal Robots Reinforcement Learning Zero-Shot Transfer Robust Control

Problem

Bipedal robots are highly susceptible to falls, yet existing recovery methods rely on posture-specific strategies or lack robustness, struggling to generalize across diverse initial configurations and often producing unstable motions.

Approach

The authors train a single MoE policy that dynamically routes recovery tasks to specialized experts based on estimated base height and proprioceptive history, guided by temporally optimized rewards and curriculum learning to ensure smooth, stable stand-ups.

Key results

Zero-shot transfer to real hardware across diverse fall postures
Smooth, hardware-compatible recovery via temporally optimized rewards and velocity constraints
Enhanced standing stability using an adaptive tracking factor to mitigate oscillations
Consistent recovery under repeated external disturbances and on inclined slopes

Why it matters

Provides a scalable, robust solution for real-world bipedal robot deployment by eliminating the need for posture-specific controllers and enabling reliable fall recovery in unstructured environments.

Abstract

Bipedal robots are inherently prone to falling due to their higher center of mass and narrower support polygon, making automatic fall recovery a long-standing challenge. Existing approaches often rely on posture-specific strategies or exhibit limited robustness and generalization, restricting their real-world applicability. We present a unified Mixture-of- Experts (MoE) framework that trains a single policy capable of recovering from diverse fallen configurations. By leverag- ing base height estimation and proprioceptive history within a gating mechanism, the framework dynamically allocates recovery tasks to specialized experts, yielding smooth and stable motions. Extensive real-world experiments show that the policy transfers zero-shot to hardware and consistently achieves recovery not only under repeated disturbances, but also from highly challenging postures and even on inclined slopes—demonstrating robustness and generalization beyond prior methods.

Index terms

Failure Detection and Recovery Reinforcement Learning Legged Robots