← Back ICRA 2026

Learning Humanoid Arm Motion Via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning

Ho Jae Lee, Se Hwan Jeon, Sangbae Kim

PDF

AI summary

Key figure (auto-extracted from paper)

Centroidal momentum regularization in a multi-agent RL framework enables humanoid robots to learn natural arm swings that significantly improve locomotion stability and disturbance recovery.

Humanoid Locomotion Multi-Agent Reinforcement Learning Centroidal Angular Momentum Whole-Body Control Disturbance Recovery Physics-Guided Learning

Problem

Coordinating arm and leg motion in humanoid robots is challenging due to complex whole-body dynamics and conflicting reinforcement learning rewards, while prior methods either rely on computationally expensive optimization or lack clear physical guidance for arm control.

Approach

The authors separate arm and leg control into distinct multi-agent RL policies trained with centralized critics but executed with decentralized actors that share only base states and centroidal angular momentum. A physics-inspired reward guides the arm agent to regulate angular momentum, promoting natural swing and balance.

Key results

Emergent anti-phase arm swing that reduces vertical ground reaction moments
Accurate centroidal angular momentum tracking under varying locomotion tasks
Robust recovery from external torque disturbances within one second
Successful hardware deployment on the MIT Humanoid across flat, rough, and stair terrains

Why it matters

Offers a scalable, biologically inspired control paradigm that enhances humanoid locomotion stability and robustness without relying on heavy model-based optimization.

Abstract

Humans naturally swing their arms during loco- motion to regulate whole-body dynamics, reduce angular mo- mentum, and help maintain balance. Inspired by this principle, we present a limb-level multi-agent reinforcement learning (RL) framework that enables coordinated whole-body control of hu- manoid robots through emergent arm motion. Our approach employs separate actor-critic structures for the arms and legs, trained with centralized critics but decentralized actors that share only base states and centroidal angular momentum (CAM) observations, allowing each agent to specialize in task-relevant behaviors through modular reward design. The arm agent guided by CAM tracking and damping rewards promotes arm motions that reduce overall angular momentum and vertical ground reaction moments, contributing to improved balance during locomotion or under external perturbations. Comparative studies with single-agent and alternative multi-agent baselines further validate the effectiveness of our approach. Finally, we deploy the learned policy on the MIT Humanoid, achieving robust performance across diverse locomotion tasks, including flat- ground walking, rough terrain traversal, and stair climbing. (see project page)

Index terms

Humanoid and Bipedal Locomotion Reinforcement Learning Whole-Body Motion Planning and Control