← Back ICRA 2026

M2oE: Modular Mixture of Experts for Multi-Morphology Reinforcement Learning of Modular Robots

Chang Liu, Qinchao Xu, Satoshi Yagi, Satoshi Yamamori, Yaonan Zhu, Yusuke Iwasawa, Kazuya Yoshida, Jun Morimoto

PDF

AI summary

Key figure (auto-extracted from paper)

M2oE enables a single reinforcement learning policy to efficiently control and generalize across multiple modular robot morphologies by using a shared expert pool and attention-based gating.

Modular robots Multi-morphology learning Reinforcement learning Mixture of Experts Isaac Lab Policy generalization

Problem

Training reinforcement learning policies for modular robots is hindered by morphological diversity causing gradient conflicts and a lack of simulators that support concurrent multi-morphology training.

Approach

The authors introduce M2oE, a backbone network that mimics modular robot structure using a shared pool of experts and an attention-based gating mechanism to dynamically route module-specific inputs, combined with an Isaac Lab extension for concurrent multi-morphology simulation training.

Key results

Higher learning efficiency and lower tracking error than MLP and Transformer baselines
Single policy successfully controls Minimal, Dragon, and Tricycle Moonbot morphologies simultaneously
Module-wise parallelizable architecture mitigates gradient conflicts across configurations
Demonstrates zero-shot generalization to unseen wave terrains

Why it matters

Provides a scalable framework for training adaptable modular robots, accelerating development for space exploration and other complex reconfigurable robotic applications.

Abstract

Modular robots offer a promising solution for building versatile and adaptable robotic systems. For instance, space exploration robots can be designed to reconfigure to meet diverse task demands across varying environments. However, training such systems by Reinforcement Learning (RL) remains challenging due to the diversity of morphologies and the lack of simulation environments that support simultaneous multi- morphology learning. We present Modular Mixture of Experts (M2oE), a novel reinforcement learning backbone network that imitates the modular structure of robots to enable efficient and module-wise parallelizable policy learning for modular robots. In M2oE, the shared pool of experts, combined with an attention- based gating mechanism that dynamically selects experts based on inter-module correlations, enables both specialization and generalization. This structure supports training across multiple morphologies within a single framework, avoiding gradient conflicts and enhancing experience sharing across modules and morphologies. To support training, we also extend the Isaac Lab simulator with multi-morphology extensions that enable concurrent training across diverse robot configurations. Experiments on a space-exploration-inspired modular robot, Moonbot, demonstrate that M2oE significantly improves learning efficiency and achieves superior performance compared to both MLP and Transformer baselines. More information and the project video are available on the project website: https: //ryuuchou17.github.io/m2oe/

Index terms

Cellular and Modular Robots Reinforcement Learning