← Back ICRA 2026

Learning Quadruped Walking from Seconds of Demonstration

Ruipeng Zhang, Hongzhan Yu, Ya-Chien Chang, Chenghao Li, Henrik I. Christensen, Sicun Gao

PDF

AI summary

Key figure (auto-extracted from paper)

Deep neural policies for quadruped walking can be trained stably from just seconds of real-world demonstration data using a novel regularization method that enforces local feedback structure.

quadruped locomotion imitation learning latent variation regularization offline learning data-efficient control Poincaré maps

Problem

Training deep neural policies for quadruped locomotion typically demands massive simulation data and struggles with sim-to-real transfer. This paper investigates how much real-world demonstration data is actually needed to train stable walking policies from scratch in a purely offline setting.

Approach

The authors analyze the local linear structure of quadruped dynamics around limit cycles and propose Latent Variation Regularization (LVR), which aligns latent space variations with output action variations to enforce stabilizing feedback without requiring explicit dynamic models.

Key results

Theoretical proof of local linear stabilizability around sparse critical Poincaré sections
Latent Variation Regularization method for offline imitation learning
Stable forward, backward, and sideways walking achieved from seconds of real-world data
Superior robustness and performance over standard behavior cloning

Why it matters

Provides a theoretically grounded, data-efficient pathway for training robust quadruped controllers directly from minimal real-world demonstrations, reducing reliance on simulation and complex model-based design.

Abstract

Quadruped locomotion provides a natural setting for understanding when model-free learning can outperform model-based control design, by exploiting data patterns to bypass the difficulty of optimizing over discrete contacts and the combinatorial explosion of mode changes. We give a principled analysis of why imitation learning with quadrupeds can be inherently effective in a small data regime, based on the structure of its limit cycles, Poincar ́e return maps, and local numerical properties of neural networks. The understanding motivates a new imitation learning method that regulates the alignment between variations in a latent space and those over the output actions. Hardware experiments confirm that a few seconds of demonstration is sufficient to train various locomotion policies from scratch entirely offline with reasonable robustness.

Index terms

Imitation Learning Legged Robots