Beyond Reactive Adaptation: Long-Horizon Memory for Autonomous Racing Via State Space Models
Grzegorz Czechmanowski, Jan Węgrzynowski, Piotr Kicki, Krzysztof, Tadeusz Walas
AI summary
Problem
Current RL racing policies rely on short-horizon reactive adaptation or impractical ground-truth data, preventing them from remembering spatial disturbances like slippery zones across laps.
Approach
We replace standard MLPs and RNNs with a Mamba State Space Model that fuses vehicle kinematics with Fourier positional encodings, creating a persistent hidden state to map and memorize localized track conditions online.
Key results
- Successful memorization of localized friction patches across laps
- Lap times approach oracle baseline within 0.1 seconds
- Consistent lap-to-lap improvement outperforming MLP and RNN baselines
- Effective in-context meta-learning for dynamic track adaptation
Why it matters
Enables autonomous racing agents to proactively exploit track conditions over long horizons, bridging the gap between reactive control and episodic learning for safer, faster real-world deployment.
Abstract
Autonomous racing pushes vehicles to their phys- ical limits, requiring control policies that can rapidly adapt to localized changes in track conditions, such as varying surface friction. Current Reinforcement Learning (RL) approaches rely either on ground-truth system identification, which is imprac- tical in the real world, or short-horizon reactive adaptations (e.g., Rapid Motor Adaptation (RMA)) that cannot remember spatial disturbances across multiple laps. In this extended abstract, we propose a novel RL architecture based on Mamba, a structured State Space Model (SSM), for autonomous racing. By fusing vehicle state with Fourier features of vehicle position on the racetrack, our Mamba-based policy builds a long-horizon episodic memory. This allows the policy not only to adapt to unknown friction online but also to map and memorize slippery zones for future laps. Evaluated in a simulated F1Tenth environment, our approach demonstrates continuous lap-to- lap improvement, approaching the performance of an ”oracle” policy trained on exact ground-truth friction, whereas standard Multi-Layer Perceptron (MLP) and Recurrent Neural Network (RNN) baselines plateau at inferior performance levels.