Research Analyzer
← Back ICRA 2026

Motion Generation for Modular Robots Using Hierarchical Policies

Kenjiro Minamikawa, Satoshi Yamamori, Satoshi Yagi, Sho Takeda, Kazuya Yoshida, Jun Morimoto

PDF

AI summary

Key figure (auto-extracted from paper)
A hierarchical reinforcement learning framework enables modular robots to reuse learned module-level motion skills across different morphologies, improving learning efficiency and scalability without retraining.
Modular robots hierarchical reinforcement learning motion generation skill reuse reconfigurable control goal-conditioned RL

Problem

Training separate RL policies for each modular robot morphology is computationally expensive and sample-inefficient, while end-to-end training fails to exploit module-specific roles.

Approach

The method separates control into a fixed, shared lower-level policy that learns reusable reaching skills for individual modules, and an upper-level policy that dynamically coordinates these modules for whole-body control across reconfigurable morphologies.

Key results

  • Single shared lower-level reaching policy reused across three morphologies without retraining
  • Scalable whole-body control across varying arm and wheel configurations
  • Improved learning efficiency and interpretability over non-hierarchical baselines
  • Dynamic upper-level goal generation coordinates locomotion and manipulation

Why it matters

Enables scalable, sample-efficient control for reconfigurable robots, advancing adaptive robotics and modular system design.

Abstract

Modular robots can be reconfigured into multiple morphologies, offering high adaptability for diverse tasks. However, reinforcement learning (RL)-based motion generation typically requires separate policy training for each morphology, and end-to-end training often fails to exploit module-specific roles. This paper proposes a hierarchical policy framework that explicitly separates control at the module level, learning reusable motion skills for each module and coordinating them with an upper-level policy for whole-body control. A single lower-level reaching policy, shared across all arm modules, is trained once and reused across morphologies, ensuring that module-specific functions are preserved even as complexity increases. The method is evaluated on the modular robot MoonBot in simulation, demonstrating scalable control of diverse morphologies and improved learning efficiency and interpretability over non-hierarchical baselines.

Index terms

Reinforcement Learning Legged Robots Cellular and Modular Robots

Related papers