Motion Generation for Modular Robots Using Hierarchical Policies
Kenjiro Minamikawa, Satoshi Yamamori, Satoshi Yagi, Sho Takeda, Kazuya Yoshida, Jun Morimoto
AI summary
Problem
Training separate RL policies for each modular robot morphology is computationally expensive and sample-inefficient, while end-to-end training fails to exploit module-specific roles.
Approach
The method separates control into a fixed, shared lower-level policy that learns reusable reaching skills for individual modules, and an upper-level policy that dynamically coordinates these modules for whole-body control across reconfigurable morphologies.
Key results
- Single shared lower-level reaching policy reused across three morphologies without retraining
- Scalable whole-body control across varying arm and wheel configurations
- Improved learning efficiency and interpretability over non-hierarchical baselines
- Dynamic upper-level goal generation coordinates locomotion and manipulation
Why it matters
Enables scalable, sample-efficient control for reconfigurable robots, advancing adaptive robotics and modular system design.
Abstract
Modular robots can be reconfigured into multiple morphologies, offering high adaptability for diverse tasks. However, reinforcement learning (RL)-based motion generation typically requires separate policy training for each morphology, and end-to-end training often fails to exploit module-specific roles. This paper proposes a hierarchical policy framework that explicitly separates control at the module level, learning reusable motion skills for each module and coordinating them with an upper-level policy for whole-body control. A single lower-level reaching policy, shared across all arm modules, is trained once and reused across morphologies, ensuring that module-specific functions are preserved even as complexity increases. The method is evaluated on the modular robot MoonBot in simulation, demonstrating scalable control of diverse morphologies and improved learning efficiency and interpretability over non-hierarchical baselines.