M2oE: Modular Mixture of Experts for Multi-Morphology Reinforcement Learning of Modular Robots
Chang Liu, Qinchao Xu, Satoshi Yagi, Satoshi Yamamori, Yaonan Zhu, Yusuke Iwasawa, Kazuya Yoshida, Jun Morimoto
AI summary
Problem
Training reinforcement learning policies for modular robots is hindered by morphological diversity causing gradient conflicts and a lack of simulators that support concurrent multi-morphology training.
Approach
The authors introduce M2oE, a backbone network that mimics modular robot structure using a shared pool of experts and an attention-based gating mechanism to dynamically route module-specific inputs, combined with an Isaac Lab extension for concurrent multi-morphology simulation training.
Key results
- Higher learning efficiency and lower tracking error than MLP and Transformer baselines
- Single policy successfully controls Minimal, Dragon, and Tricycle Moonbot morphologies simultaneously
- Module-wise parallelizable architecture mitigates gradient conflicts across configurations
- Demonstrates zero-shot generalization to unseen wave terrains
Why it matters
Provides a scalable framework for training adaptable modular robots, accelerating development for space exploration and other complex reconfigurable robotic applications.
Abstract
Modular robots offer a promising solution for building versatile and adaptable robotic systems. For instance, space exploration robots can be designed to reconfigure to meet diverse task demands across varying environments. However, training such systems by Reinforcement Learning (RL) remains challenging due to the diversity of morphologies and the lack of simulation environments that support simultaneous multi- morphology learning. We present Modular Mixture of Experts (M2oE), a novel reinforcement learning backbone network that imitates the modular structure of robots to enable efficient and module-wise parallelizable policy learning for modular robots. In M2oE, the shared pool of experts, combined with an attention- based gating mechanism that dynamically selects experts based on inter-module correlations, enables both specialization and generalization. This structure supports training across multiple morphologies within a single framework, avoiding gradient conflicts and enhancing experience sharing across modules and morphologies. To support training, we also extend the Isaac Lab simulator with multi-morphology extensions that enable concurrent training across diverse robot configurations. Experiments on a space-exploration-inspired modular robot, Moonbot, demonstrate that M2oE significantly improves learning efficiency and achieves superior performance compared to both MLP and Transformer baselines. More information and the project video are available on the project website: https: //ryuuchou17.github.io/m2oe/