Opt2Skill: Imitating Dynamically-Feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation
Fukang Liu, Zhaoyuan Gu, Yilin Cai, Ziyi Zhou, Hyunyoung Jung, Jaehwi Jang, Shijie Zhao, Sehoon Ha, Yue Chen, Danfei Xu, Ye Zhao
AI summary
Problem
Humanoid control is hindered by high-dimensional dynamics and contact-rich interactions, where existing RL methods often produce unnatural motions and model-based methods are computationally expensive for real-time use.
Approach
The Opt2Skill pipeline uses Differential Dynamic Programming (DDP) to create dynamically feasible reference trajectories that serve as supervision for training robust RL policies via sim-to-real transfer.
Key results
- Outperformed human demonstration and IK baselines in motion tracking accuracy and task success rates
- Improved contact force tracking in contact-rich tasks by incorporating joint torque information from TO
- Successful real-world deployment across diverse tasks including bulky-object handling, door traversing, and stair climbing
- Achieved sim-to-real transfer to hardware without requiring online trajectory adaptation
Why it matters
This framework provides a scalable method for generating physically consistent motions that allow humanoid robots to perform versatile, high-dimensional loco-manipulation tasks in the real world.
Abstract
Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer flexibility to define precise motion but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) handles high- dimensional spaces with strong robustness but suffers from ineffi- cient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. Opt2Skill generates dynamic feasible and contact-consistent reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and trains RL policies to track these optimal trajectories. Our results demonstrate that Opt2Skill outperforms baselines that rely on human demonstrations and inverse kinematics-based refer- ences, both in motion tracking and task success rates. Furthermore, we show that incorporating trajectories with torque information improves contact force tracking in contact-involved tasks, such as wiping a table.