PPF: Pre-Training and Preservative Fine-Tuning of Humanoid Locomotion Via Model-Assumption-Based Regularization
Hyunyoung Jung, Zhaoyuan Gu, Ye Zhao, Hae-Won Park, Sehoon Ha
AI summary
Problem
Humanoid locomotion requires adapting to complex, unpredictable environments, but learning-based policies often suffer from catastrophic forgetting of stable, periodic gaits during reinforcement learning fine-tuning, leading to instability and poor sim-to-real transfer.
Approach
The framework pre-trains a neural policy by imitating a model-based controller, then fine-tunes it via reinforcement learning while dynamically weighting a regularization term based on how much the robot's current state violates the controller's underlying model assumptions.
Key results
- Achieves 1.5 m/s forward walking speed on the full-size Digit humanoid robot
- Maintains robust locomotion across slippery, sloped, uneven, and sandy terrains
- Dynamically adjusts regularization to preserve periodic gaits during RL fine-tuning
- Demonstrates successful zero-shot sim-to-real deployment on hardware
Why it matters
Provides a practical, sample-efficient pathway for deploying agile and robust humanoid locomotion policies in unstructured real-world environments.
Abstract
Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine- tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.