U2E: Uncertainty-Aware Modeling and Uncertainty-Guided Exploration with Deep Ensemble for Quadrupedal Robot
Zitong Bai, Yince Gao, Naiyuan Liu, Yiming Huang, Xiaolong Yu, Wei Wang
AI summary
Problem
Policies trained in simulation often fail on real quadrupedal robots due to inaccurate actuator dynamics and the sim-to-real gap, while existing system identification methods rely on manual trajectory design and lack reliable uncertainty quantification.
Approach
The U2E framework uses a deep neural ensemble to model actuator dynamics and quantify epistemic uncertainty, then trains an exploration policy to autonomously collect informative real-world data that iteratively refines the model.
Key results
- Accurately captures complex nonlinear actuator dynamics with quantified epistemic uncertainty
- Enables autonomous data collection without manually designed excitation trajectories
- Reduces sim-to-real modeling errors and improves locomotion performance on hardware
- Provides a robust uncertainty-aware simulation foundation for downstream reinforcement learning
Why it matters
Enables reliable, automated sim-to-real transfer for agile quadrupedal robots, reducing the need for manual tuning and extensive expert data collection.
Abstract
Reinforcement learning has facilitated agile loco- motion in quadrupedal robots. However, most works remain highly dependent on the accuracy of simulation models in describing real-world robot dynamics. Consequently, policy transfer from simulation to hardware is still hindered by the well-known sim-to-real gap, which typically arises from modeling errors and the challenges of efficiently obtaining informative data in large state-action spaces. To address these challenges, this work proposes an innovative framework U2E that integrates Uncertainty-aware actuator modeling with an Uncertainty-guided Exploration policy. The actuator model lever- ages a deep ensemble of neural networks to provide both precise predictions and uncertainty estimates, allowing for the assessment of model confidence and the identification of regions with inadequate data coverage. The exploration strategy then actively guides data collection to autonomously acquire informative real-world samples and refine actuator models, thereby enhancing compensation for simulation discrepancies. Experiments on the quadrupedal locomotion tasks, including jumping and trajectory tracking, demonstrate that our ap- proach reduces the sim-to-real gap and improves performance without the dependence on manually designed trajectories.