Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization
Fan Yang, Wenxuan Zhou, Zuxin Liu, DING ZHAO, David Held
Abstract
Safe Reinforcement Learning (RL) plays an im- portant role in applying RL algorithms to safety-critical real- world applications, addressing the trade-off between maximiz- ing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space of a modified Markov Decision Process (MDP). The RL agent produces a sequence of actions that are transformed into safe trajectories by a trajectory optimizer, thereby effectively ensuring safety and increasing training stability. This novel approach excels in its performance on challenging Safety Gym tasks, achieving significantly higher rewards and near-zero safety violations during inference. The method’s real-world applicability is demonstrated through a safe and effective deployment in a real robot task of box-pushing around obstacles. Further insights are available from the videos and appendix on our website: https://sites.google.com/view/safemdp.