Learning Humanoid Arm Motion Via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning
Ho Jae Lee, Se Hwan Jeon, Sangbae Kim
AI summary
Problem
Coordinating arm and leg motion in humanoid robots is challenging due to complex whole-body dynamics and conflicting reinforcement learning rewards, while prior methods either rely on computationally expensive optimization or lack clear physical guidance for arm control.
Approach
The authors separate arm and leg control into distinct multi-agent RL policies trained with centralized critics but executed with decentralized actors that share only base states and centroidal angular momentum. A physics-inspired reward guides the arm agent to regulate angular momentum, promoting natural swing and balance.
Key results
- Emergent anti-phase arm swing that reduces vertical ground reaction moments
- Accurate centroidal angular momentum tracking under varying locomotion tasks
- Robust recovery from external torque disturbances within one second
- Successful hardware deployment on the MIT Humanoid across flat, rough, and stair terrains
Why it matters
Offers a scalable, biologically inspired control paradigm that enhances humanoid locomotion stability and robustness without relying on heavy model-based optimization.
Abstract
Humans naturally swing their arms during loco- motion to regulate whole-body dynamics, reduce angular mo- mentum, and help maintain balance. Inspired by this principle, we present a limb-level multi-agent reinforcement learning (RL) framework that enables coordinated whole-body control of hu- manoid robots through emergent arm motion. Our approach employs separate actor-critic structures for the arms and legs, trained with centralized critics but decentralized actors that share only base states and centroidal angular momentum (CAM) observations, allowing each agent to specialize in task-relevant behaviors through modular reward design. The arm agent guided by CAM tracking and damping rewards promotes arm motions that reduce overall angular momentum and vertical ground reaction moments, contributing to improved balance during locomotion or under external perturbations. Comparative studies with single-agent and alternative multi-agent baselines further validate the effectiveness of our approach. Finally, we deploy the learned policy on the MIT Humanoid, achieving robust performance across diverse locomotion tasks, including flat- ground walking, rough terrain traversal, and stair climbing. (see project page)