← Back ICRA 2026

PPF: Pre-Training and Preservative Fine-Tuning of Humanoid Locomotion Via Model-Assumption-Based Regularization

Hyunyoung Jung, Zhaoyuan Gu, Ye Zhao, Hae-Won Park, Sehoon Ha

PDF

AI summary

Key figure (auto-extracted from paper)

A novel two-stage learning framework prevents catastrophic forgetting during reinforcement learning fine-tuning, enabling humanoid robots to achieve high-speed, robust locomotion across diverse real-world terrains.

humanoid locomotion reinforcement learning model-assisted learning catastrophic forgetting sim-to-real transfer model-based control

Problem

Humanoid locomotion requires adapting to complex, unpredictable environments, but learning-based policies often suffer from catastrophic forgetting of stable, periodic gaits during reinforcement learning fine-tuning, leading to instability and poor sim-to-real transfer.

Approach

The framework pre-trains a neural policy by imitating a model-based controller, then fine-tunes it via reinforcement learning while dynamically weighting a regularization term based on how much the robot's current state violates the controller's underlying model assumptions.

Key results

Achieves 1.5 m/s forward walking speed on the full-size Digit humanoid robot
Maintains robust locomotion across slippery, sloped, uneven, and sandy terrains
Dynamically adjusts regularization to preserve periodic gaits during RL fine-tuning
Demonstrates successful zero-shot sim-to-real deployment on hardware

Why it matters

Provides a practical, sample-efficient pathway for deploying agile and robust humanoid locomotion policies in unstructured real-world environments.

Abstract

Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine- tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.

Index terms

Humanoid and Bipedal Locomotion Reinforcement Learning Continual Learning