Adaptive Legged Locomotion Via Online Learning for Model Predictive Control
Hongyu Zhou, Xiaoyu Zhang, Vasileios Tzoumas
AI summary
Problem
Legged robots struggle to maintain accurate trajectory tracking under real-world uncertainties like unknown payloads and uneven terrain, while existing control methods either require costly offline training or make overly conservative worst-case assumptions.
Approach
The method couples model predictive control with online least-squares estimation using random Fourier features to continuously learn and compensate for unknown dynamics and disturbances in real-time.
Key results
- Sublinear dynamic regret guarantee against optimal clairvoyant controller
- Up to 67% tracking improvement over nominal MPC and 21% over L1-MPC
- Robust performance under unknown payloads (8 kg), time-varying friction, and rough/sloped terrain
- Real-time online learning of residual dynamics via random Fourier features
Why it matters
It enables reliable, adaptive legged robot operation in unstructured environments without costly offline training, directly benefiting search-and-rescue, logistics, and industrial inspection applications.
Abstract
We provide an algorithm for adaptive legged locomo- tionviaonlinelearningandmodelpredictivecontrol.Thealgorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamicscanrepresentmodelingerrorsandexternaldisturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world un- known uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear dynamic regret, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include con- stant unknown external forces up to 12g, where g is the gravity vector, in flat terrain, slope terrain with 20◦inclination, and rough terrain with 0.25 m height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to 8 kg and time-varying ground friction coefficients in flat terrain.