← Back ICRA 2026

Adaptive Motion Priors with Constrained Optimization

Tuchapong Sangthaworn, Bawornsak Sakulkueakulsuk

PDF

AI summary

Key figure (auto-extracted from paper)

AMPCO automatically transitions from human motion imitation to task-focused optimization, reducing energy consumption by 70% and variance by up to 90% while maintaining tracking accuracy in humanoid robots.

Humanoid locomotion Adaptive reward weighting Constrained optimization Motion priors Reinforcement learning Phase transition detection

Problem

High-DOF humanoid robots face complex reward surfaces where balancing task objectives and motion quality requires manual tuning or fixed weights, leading to inefficient exploration and unstable learning.

Approach

The framework uses adaptive imitation guidance to learn human-like gaits, automatically detects learning convergence via percentile-based breakout detection, and switches to constrained optimization that dynamically balances task rewards within statistically guaranteed behavioral bounds.

Key results

Reduces energy consumption variance by 67–90% across baselines
Achieves 70% lower energy consumption than task-focused baselines
Maintains velocity tracking accuracy comparable to top methods
Adds negligible computational overhead (<0.012% per cycle)

Why it matters

Provides a reliable, automated reward-shaping pipeline for humanoid locomotion that eliminates manual tuning and bridges human motion priors with task-specific optimization.

Abstract

Choosing locomotion learning paradigm in high- DOF system like humanoid robot faces several challenges. Free exploration creates complex reward surfaces that resist efficient exploration, while human motion priors cannot be directly copied due to different mechanical constraints. We present Adaptive Motion Priors with Constrained Optimization (AM- PCO), a novel framework that transitions from human refer- ence motions to task-focused optimization within learned behav- ioral bounds. AMPCO employs a two-phase optimization strat- egy: (1) Adaptive Imitation Guidance that prioritizes human motion, and (2) Adaptive Reward Weighting for Constrained Optimization that optimizes task objectives while maintaining motion quality within statistically-guaranteed bounds from Phase I. The transition between phases is automatically detected through percentile-based breakout detection from discriminator convergence. AMPCO introduces adaptive weighting mecha- nisms that smoothly adjust the importance of human imitation based on learning progress. Our experiments on the Unitree G1 humanoid robot simulation demonstrate that AMPCO reduces energy consumption variance by 67-90% across all baseline methods while achieving 70% lower energy consumption than task-focused baseline while maintaining velocity tracking accu- racy comparable to the best-performing methods, with minimal computational overhead (<0.012% per training cycle).

Index terms

Imitation Learning Humanoid and Bipedal Locomotion Reinforcement Learning