← Back ICRA 2026

Opt2Skill: Imitating Dynamically-Feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Fukang Liu, Zhaoyuan Gu, Yilin Cai, Ziyi Zhou, Hyunyoung Jung, Jaehwi Jang, Shijie Zhao, Sehoon Ha, Yue Chen, Danfei Xu, Ye Zhao

PDF

AI summary

Key figure (auto-extracted from paper)

Combining model-based trajectory optimization with RL enables humanoid robots to perform complex loco-manipulation tasks with superior tracking and success rates compared to human or IK-based imitation.

Humanoid Robots Loco-manipulation Reinforcement Learning Trajectory Optimization Sim-to-Real

Problem

Humanoid control is hindered by high-dimensional dynamics and contact-rich interactions, where existing RL methods often produce unnatural motions and model-based methods are computationally expensive for real-time use.

Approach

The Opt2Skill pipeline uses Differential Dynamic Programming (DDP) to create dynamically feasible reference trajectories that serve as supervision for training robust RL policies via sim-to-real transfer.

Key results

Outperformed human demonstration and IK baselines in motion tracking accuracy and task success rates
Improved contact force tracking in contact-rich tasks by incorporating joint torque information from TO
Successful real-world deployment across diverse tasks including bulky-object handling, door traversing, and stair climbing
Achieved sim-to-real transfer to hardware without requiring online trajectory adaptation

Why it matters

This framework provides a scalable method for generating physically consistent motions that allow humanoid robots to perform versatile, high-dimensional loco-manipulation tasks in the real world.

Abstract

Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer flexibility to define precise motion but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) handles high- dimensional spaces with strong robustness but suffers from ineffi- cient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. Opt2Skill generates dynamic feasible and contact-consistent reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and trains RL policies to track these optimal trajectories. Our results demonstrate that Opt2Skill outperforms baselines that rely on human demonstrations and inverse kinematics-based refer- ences, both in motion tracking and task success rates. Furthermore, we show that incorporating trajectories with torque information improves contact force tracking in contact-involved tasks, such as wiping a table.

Index terms

Humanoid and Bipedal Locomotion Whole-Body Motion Planning and Control Reinforcement Learning