← Back ICRA 2026

A Champion-Level Vision-Based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter Wurman

PDF

AI summary

Key figure (auto-extracted from paper)

The first vision-based reinforcement learning agent to achieve champion-level performance in competitive racing using only onboard camera and sensor data.

Vision-based RL Autonomous Racing Gran Turismo 7 Asymmetric Actor-Critic Recurrent Memory Real-world Deployment

Problem

Existing superhuman racing agents rely on global features like precise track geometry and opponent localization, which are difficult to obtain in real-world settings and introduce latency.

Approach

We train an asymmetric actor-critic network where the actor processes only ego-centric camera and IMU data through a recurrent memory module to handle partial observability, while the critic uses global features during training.

Key results

Consistently secures first place against 19 built-in AI opponents starting from last position
Outperforms human expert and champion drivers across Tokyo, Spa, and Sarthe tracks
First vision-based agent to achieve champion-level performance in competitive multi-opponent racing
Ablation studies validate the critical role of recurrent memory and asymmetric architecture for handling occlusions

Why it matters

Eliminates the need for external instrumentation, paving the way for practical real-world deployment of high-performance autonomous racing systems.

Abstract

Deep reinforcement learning has achieved superhu- man racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting real-world applicability. To address this limitation, we introduce a vision-based autonomous racing agent that relies solely on ego-centric camera views and onboard sensor data, eliminating the need for precise localization during inference. This agent employs an asymmetric actor-critic framework: the actor uses a recurrent neural network with the sensor data local to the car to retain track layouts and opponent positions, while the critic accesses the global features during training. Evaluated in GT7, our agent consistently outper- forms GT7’s built-drivers. To our knowledge, this work presents the first vision-based autonomous racing agent to demonstrate champion-level performance in competitive racing scenarios.

Index terms

Autonomous Agents Reinforcement Learning Vision-Based Navigation