Learning Setup Policies: Reliable Transition between Locomotion Behaviours
Tidd, Brendan,Leitner, Jurgen,Cosgun, Akansel,Hudson, Nicolas
Abstract
Dynamic platforms that operate over many unique terrain conditions typically require many behaviours. To transi- tion safely, there must be an overlap of states between adjacent controllers. We develop a novel method for training setup policies that bridge the trajectories between pre-trained Deep Reinforce- ment Learning (DRL) policies. We demonstrate our method with a simulated biped traversing a difficult jump terrain, where a single policy fails to learn the task, and switching between pre-trained policies without setup policies also fails. We perform an ablation of key components of our system, and show that our method outperforms others that learn transition policies. We demonstrate our method with several difficult and diverse terrain types, and showthatwecanusesetuppoliciesaspartofamodularcontrolsuite to successfully traverse a sequence of complex terrains. We show that using setup policies improves the success rate for traversing a single difficult jump terrain (from 51.3% success rate with the best comparative method to 82.2%), and traversing a random sequence of difficult obstacles (from 1.9% without setup policies to 71.2%).