← Back ICRA 2024

Maximizing Quadruped Velocity by Minimizing Energy

Srinath Mahankali, Chi-Chang Lee, Gabriel Margolis, Zhang-Wei Hong, Pulkit Agrawal

PDF

Abstract

Reinforcement Learning (RL) has been a powerful tool for training robots to acquire agile locomotion skills. To learn locomotion, it is commonly necessary to introduce addi- tional reward-shaping terms, such as an energy minimization term, to guide an algorithm like Proximal Policy Optimization (PPO) to good performance. Prior works rely on hyper- parameter tuning on the weight of the reward shaping terms to obtain satisfactory task performance. To save the efforts of tuning these weights, we adopt the Extrinsic-Intrinsic Policy Optimization (EIPO) framework. The key idea of EIPO is to establish a constrained optimization framework for the primary objective of enhancing task performance and the secondary objective of minimizing energy consumption. It seeks a policy that minimizes the energy consumption objective within the optimal policy space for task performance. This guarantees that the learned policy excels in task performance while conserving energy, all without requiring manual weight adjustments for both objectives. Our experiments evaluate EIPO on various quadruped locomotion tasks, revealing that policies trained with EIPO consistently achieve higher task performance than PPO comparisons while maintaining comparable energy con- sumption levels. Furthermore, EIPO exhibits superior task performance in real-world evaluations compared to PPO.

Index terms

Reinforcement Learning Legged Robots Sensorimotor Learning