Learning Locomotion for Quadruped Robots Via Distributional Ensemble Actor-Critic
Sicen Li, YiMing Pang, Panju Bai, Jiawei Li, Zhaojin Liu, Shihao Hu, Li-Quan Wang, Gang Wang
Abstract
Domain randomization introduces perturbations in the simulation to make controllers less susceptible to the reality gap, which enables remarkable sim-to-real transfer on real quadruped robots. However, aleatoric uncertainty originating from perturba- tions could often lead to suboptimal controllers. In this work, we present a novel algorithm called Distributional Ensemble Actor- Critic (DEAC) that blends three ideas: distributional representa- tion of a critic, lower bounds of the value distribution, and ensem- bling of multiple critics and actors. Distributional representation and ensembling provide reasonable uncertainty estimates, while lower bounds of the value distribution offer finer-grained error control. The simulation results show that the controller trained by DEAC outperforms the other baselines in the domain random- ization setting. The trained controller is deployed on an A1-like robot,demonstratinghigh-speedrunningandtheabilitytotraverse diverse terrains such as slippery plates, grassland, and wet dirt.