Learning Humanoid Loco-Manipulation with Constraints As Terminations
Pierre-Alexandre Leziart, Mitsuharu Morisawa, Fumio Kanehiro
Abstract
Deep Reinforcement Learning (RL) is now com- monly used for controlling legged robots. Several recent studies have demonstrated impressive results in solving increasingly complex robotic tasks such as navigation in unstructured environments or loco-manipulation. However, this complexity often comes with intricate learning setups requiring tedious reward shaping and features to help convergence. In this work, we tackle these issues and achieve loco-manipulation with a humanoid robot using a RL algorithm that enforces constraints through stochastic terminations during policy learning. We keep the number of rewards low by reformulating them as constraints when they can be intuitively expressed that way. Moreover, we study the relevance of various learning features encountered in the literature and show that providing observa- tions without noise or privileged information to the critic are two straightforward ways to boost locomotion performances on rough terrains. We also demonstrate that the proposed minimalist architecture is not limited to pure locomotion but extends to a loco-manipulation task involving upper limbs. Videos are available at humanoid-cat.github.io.