Learning Motion Skills with Adaptive Assistive Curriculum Force in Humanoid Robots
Zhanxiang Cao, Yang Zhang, Buqing Nie, Huangxuan Lin, Haoyang Li, Yizhi Chen, Xiaokang Yang, Yue Gao
AI summary
Problem
Humanoid robots struggle to efficiently and stably learn complex motion skills due to slow exploration, instability, and susceptibility to local optima in reinforcement learning. Existing methods often rely on fixed rewards, expert demonstrations, or non-adaptive external aids that do not scale well to high-dimensional control tasks.
Approach
The authors propose A2CF, a dual-agent reinforcement learning framework where a dedicated assistive force agent applies state-dependent guidance that is gradually reduced via a curriculum, combined with privileged information and random masking to prevent over-reliance and improve generalization.
Key results
- Converges 30% faster than baseline methods across walking, dancing, and backflip tasks.
- Reduces training failure rates by over 40%.
- Successfully transfers learned policies to a physical Unitree G1 humanoid robot without fine-tuning.
- Ablation studies confirm that privileged information, task-specific initial force bounds, and random masking each significantly contribute to learning stability and efficiency.
Why it matters
Provides a scalable, human-inspired training paradigm that accelerates the acquisition of complex whole-body skills in high-dimensional robotic systems, bridging the gap between simulation and real-world deployment.
Abstract
Learning policies for complex humanoid tasks remains both challenging and compelling. Inspired by how infants and athletes rely on external support—such as parental walkers or coach-applied guidance—to acquire skills like walk- ing, dancing, and performing acrobatic flips, we propose A2CF: Adaptive Assistive Curriculum Force for humanoid motion learn- ing. A2CF trains a dual-agent system, in which a dedicated assistive force agent applies state-dependent forces to guide the robot through difficult initial motions and gradually reduces assistance as the robot’s proficiency improves. Across three benchmarks—bipedal walking, choreographed dancing, and backflips—A2CF achieves convergence 30% faster than base- line methods, lowers failure rates by over 40%, and ultimately produces robust, support-free policies. Real-world experiments further demonstrate that adaptively applied assistive forces significantly accelerate the acquisition of complex skills in high- dimensional robotic control.