← Back ICRA 2023

Multi-Alpha Soft Actor-Critic: Overcoming Stochastic Biases in Maximum Entropy Reinforcement Learning

Conor Igoe, Swapnil Pande, Siddarth Venkatraman, Jeff Schneider

PDF

Abstract

The successful application of robotic control re- quires intelligent decision-making to handle the long tail of com- plex scenarios that arise in real-world environments. Recently, Deep Reinforcement Learning (DRL) has provided a data- driven framework to automatically learn effective policies in such complex settings. Since its introduction in 2018, Soft Actor- Critic (SAC) remains as one of the most popular off-policy DRL algorithms and has been used extensively to learn performant robotic control policies. However, in this paper we argue that by relying on the maximum entropy formalism to define learning objectives, previous work introduces a significant bias away from optimal decision making, which often requires near- deterministic behaviour for high-precision tasks. Moreover, we show that when training with the original variants of SAC, overcoming this bias by reducing entropy budgets or entropy coefficients introduces separate issues that lead to slow or unstable learning. We address these shortcomings by treating the entropy coefficient α as a random variable and introduce Multi-Alpha Soft Actor-Critic (MAS). We show how MAS overcomes the stochastic bias of SAC in a variety of robotic control tasks including the CARLA urban-driving simulator, while maintaining the stability and sample efficiency of the original algorithms.

Index terms

Reinforcement Learning