Continual-RL for Generalization in Autonomous Racing on the RoboRacer Platform
Joel Siegert, Edoardo Ghignone, Michele Magno
AI summary
Problem
Real-world reinforcement learning struggles with sample efficiency and catastrophic forgetting when adapting to new, unseen environments. Autonomous racing specifically demands rapid policy updates to novel track layouts and tire-floor combinations with minimal physical data.
Approach
The authors adapt the sample-efficient Soft Actor-Critic algorithm with Continual Backpropagation and L2 initialization to maintain neural plasticity while learning from multiple real-world tracks. They also benchmark this against an offline RL pre-training method using Implicit Q-Learning.
Key results
- CBP-enhanced SAC surpasses classical controllers after 15 minutes of fine-tuning on unseen tracks
- Offline RL pre-training shows promising plasticity but lower final performance than continual learning
- Simulation analysis confirms continual techniques improve fine-tuning over buffer management alone
- Tracks, simulation models, and RL frameworks open-sourced for replication
Why it matters
Provides a practical, sample-efficient pathway for deploying adaptable RL controllers on physical robots in non-stationary environments, directly benefiting autonomous racing and real-world robotics research.
Abstract
A key challenge in modern robotics is to adapt to changing environments, a challenge that is exacerbated when simulations cannot encompass every possible real-world configuration, and therefore Reinforcement Learning (RL) in the physical world becomes necessary. Continual Reinforcement Learning (RL) provides the tools to address this challenge; however, both the frameworks and the methods remain un- derexplored. Autonomous Racing (AR) and in particular the RoboRacer competition provide a testing ground for such methods, as learning to drive on a new track-floor combination with the least amount of new experience naturally frames a continual learning problem. This work tries to address this gap by proposing a continual RL framework based on Continual Backpropagation (CBP) that is able, with only real-world data, to train a generalistic policy on a set of tracks and then fine- tune it within 15 minutes to outperform classical controllers. Furthermore, a comparison method based on offline RL is proposed, and a simulation analysis of the plasticity properties of the methods is conducted.