← Back ICRA 2024

HyperPPO: A Scalable Method for Finding Small Policies for Robotic Control

Shashank Hegde, Zhehui Huang, Gaurav Sukhatme

PDF

Abstract

Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Find- ing these smaller neural network architectures can be time- consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to es- timate the weights of multiple neural architectures simulta- neously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster con- vergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo

Index terms

Reinforcement Learning Machine Learning for Robot Control Swarm Robotics