← Back ICRA 2026

Enhancing Robustness of Locomotion Policy for Quadrupedal Robot with Deep Disturbance Observer

Fikih Muhamad, Anak Agung Krisna Ananda Kusuma, Jae-Han Park, Jung Su Kim

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating a deep disturbance observer and state estimator enables RL locomotion policies to robustly handle unseen real-world disturbances without additional tuning.

Quadrupedal robot Deep disturbance observer Reinforcement learning Robust locomotion State estimation Simulation-to-reality

Problem

Deep reinforcement learning locomotion policies perform optimally under nominal conditions but fail when encountering real-world uncertainties like external forces, payload changes, and sensor noise. Existing robustness methods struggle with high-dimensional dynamics and generalizing to unseen scenarios.

Approach

The framework trains a locomotion policy alongside a deep disturbance observer (approximating inverse dynamics) and a deep state estimator (predicting privileged states) using hybrid LSTM-MLP networks under nominal simulation conditions, then deploys them directly to simulation and a real robot without further tuning.

Key results

Handles 200 N lateral forces, 5 kg payload, and 0.2-0.4 friction coefficients
Achieves lower velocity tracking errors compared to baseline PPO methods
Successfully transfers from IsaacGym to Gazebo and a real Unitree Go1 without tuning
Hybrid LSTM-MLP architectures outperform standalone RNNs or MLPs in estimation accuracy

Why it matters

Provides a practical, model-free framework for deploying robust RL locomotion policies on real quadrupedal robots operating in unpredictable environments.

Abstract

This letter proposes a control framework to enhance the robustness of a locomotion policy against uncertainties by integrating it with a deep disturbance observer (DOB) network and a deep state estimator network. The deep DOB approximates the inverse model of a quadrupedal robot. The locomotion policy is trained to produce optimal actions, with the deep DOB estimating the overall uncertainties of the robot, and the deep state estimator estimates the body’s linear velocities. All networks are trained under nominal conditions in IsaacGym. Subsequently, all the trained networks are transferred to Gazebo and a real robot with ROS2 are used to validate their robustness under uncertain conditions without additional tuning. Furthermore, validation results show that the proposed control framework performs best in velocity tracking compared to the baseline method in terms of lowest estimation errors. This emphasizes the effectiveness of the proposed control framework in improving robustness of the locomotion policy. Videos on IsaacGym and Gazebo simulation, and real robot experiment are available at Project page: bit.ly/3CF3OTQ.

Index terms

Reinforcement Learning Legged Robots Humanoid and Bipedal Locomotion