Adaptive Motion Priors with Constrained Optimization
Tuchapong Sangthaworn, Bawornsak Sakulkueakulsuk
AI summary
Problem
High-DOF humanoid robots face complex reward surfaces where balancing task objectives and motion quality requires manual tuning or fixed weights, leading to inefficient exploration and unstable learning.
Approach
The framework uses adaptive imitation guidance to learn human-like gaits, automatically detects learning convergence via percentile-based breakout detection, and switches to constrained optimization that dynamically balances task rewards within statistically guaranteed behavioral bounds.
Key results
- Reduces energy consumption variance by 67–90% across baselines
- Achieves 70% lower energy consumption than task-focused baselines
- Maintains velocity tracking accuracy comparable to top methods
- Adds negligible computational overhead (<0.012% per cycle)
Why it matters
Provides a reliable, automated reward-shaping pipeline for humanoid locomotion that eliminates manual tuning and bridges human motion priors with task-specific optimization.
Abstract
Choosing locomotion learning paradigm in high- DOF system like humanoid robot faces several challenges. Free exploration creates complex reward surfaces that resist efficient exploration, while human motion priors cannot be directly copied due to different mechanical constraints. We present Adaptive Motion Priors with Constrained Optimization (AM- PCO), a novel framework that transitions from human refer- ence motions to task-focused optimization within learned behav- ioral bounds. AMPCO employs a two-phase optimization strat- egy: (1) Adaptive Imitation Guidance that prioritizes human motion, and (2) Adaptive Reward Weighting for Constrained Optimization that optimizes task objectives while maintaining motion quality within statistically-guaranteed bounds from Phase I. The transition between phases is automatically detected through percentile-based breakout detection from discriminator convergence. AMPCO introduces adaptive weighting mecha- nisms that smoothly adjust the importance of human imitation based on learning progress. Our experiments on the Unitree G1 humanoid robot simulation demonstrate that AMPCO reduces energy consumption variance by 67-90% across all baseline methods while achieving 70% lower energy consumption than task-focused baseline while maintaining velocity tracking accu- racy comparable to the best-performing methods, with minimal computational overhead (<0.012% per training cycle).