← Back ICRA 2026

Beyond Reactive Adaptation: Long-Horizon Memory for Autonomous Racing Via State Space Models

Grzegorz Czechmanowski, Jan Węgrzynowski, Piotr Kicki, Krzysztof, Tadeusz Walas

PDF

AI summary

Key figure (auto-extracted from paper)

A Mamba-based reinforcement learning policy enables autonomous racers to memorize and adapt to track friction across multiple laps, approaching oracle-level performance.

State Space Models Autonomous Racing Reinforcement Learning Long-Horizon Memory Mamba Architecture Sim-to-Real Adaptation

Problem

Current RL racing policies rely on short-horizon reactive adaptation or impractical ground-truth data, preventing them from remembering spatial disturbances like slippery zones across laps.

Approach

We replace standard MLPs and RNNs with a Mamba State Space Model that fuses vehicle kinematics with Fourier positional encodings, creating a persistent hidden state to map and memorize localized track conditions online.

Key results

Successful memorization of localized friction patches across laps
Lap times approach oracle baseline within 0.1 seconds
Consistent lap-to-lap improvement outperforming MLP and RNN baselines
Effective in-context meta-learning for dynamic track adaptation

Why it matters

Enables autonomous racing agents to proactively exploit track conditions over long horizons, bridging the gap between reactive control and episodic learning for safer, faster real-world deployment.

Abstract

Autonomous racing pushes vehicles to their phys- ical limits, requiring control policies that can rapidly adapt to localized changes in track conditions, such as varying surface friction. Current Reinforcement Learning (RL) approaches rely either on ground-truth system identification, which is imprac- tical in the real world, or short-horizon reactive adaptations (e.g., Rapid Motor Adaptation (RMA)) that cannot remember spatial disturbances across multiple laps. In this extended abstract, we propose a novel RL architecture based on Mamba, a structured State Space Model (SSM), for autonomous racing. By fusing vehicle state with Fourier features of vehicle position on the racetrack, our Mamba-based policy builds a long-horizon episodic memory. This allows the policy not only to adapt to unknown friction online but also to map and memorize slippery zones for future laps. Evaluated in a simulated F1Tenth environment, our approach demonstrates continuous lap-to- lap improvement, approaching the performance of an ”oracle” policy trained on exact ground-truth friction, whereas standard Multi-Layer Perceptron (MLP) and Recurrent Neural Network (RNN) baselines plateau at inferior performance levels.

Index terms

Reinforcement Learning Machine Learning for Robot Control Autonomous Vehicle Navigation