← Back ICRA 2026

Vision-Based Policy Learning for High-Speed Autonomous Racing

Haoran Xu, Xianwei Chen, Yilin Lang, Qinyuan Ren

PDF

AI summary

Key figure (auto-extracted from paper)

A two-phase learning framework distills privileged-information racing policies into a vision-only student network, enabling high-speed, safe, and smooth autonomous racing with zero sim-to-real transfer.

Autonomous Racing Vision-Based Control Reinforcement Learning Knowledge Distillation Sim-to-Real Transfer Depth Perception

Problem

Classical modular racing systems suffer from computational inefficiency and error propagation, while existing end-to-end reinforcement learning methods struggle with high-dimensional visual data and lack global track information needed for optimal behavior.

Approach

The authors train a teacher policy using privileged racetrack data and reinforcement learning to generate optimal trajectories, then distill this knowledge into a vision-based student policy using a VAE for noise robustness and an RNN for temporal memory.

Key results

High-speed driving with high success rate in simulation
Zero-shot sim-to-real transfer on a 1/10-scale physical race car
Outperforms model-based and learning-based baselines
Robust control under noisy depth observations and partial observability

Why it matters

Enables practical deployment of high-performance autonomous racing agents using only local visual sensors, bridging the sim-to-real gap for dynamic vehicle control.

Abstract

Motion planning for autonomous vision-based car racing is a challenging task in robotics. Classical racing systems divide the task into numerous submodules, undermining compu- tational efficiency and leading to error propagation. Previous studies have demonstrated impressive reinforcement learning (RL) results for end-to-end autonomous driving. However, RL exhibits poor scalability on high-dimensional data, such as images, and it is challenging to learn optimal racing behaviors due to a lack of global information about the environments. To address these issues, a two-phase learning paradigm is proposed in this work to train a vision-based racing policy. First, RL trains a teacher policy that integrates progress maximization with collision avoidance in the reward function and utilizes privileged information about the racetrack to achieve high-performance racing. Then, a student policy, relying only on an ego-centric depth camera for perception, is trained by distilling racing knowledge from the teacher policy. The student policy achieves high-speed drive, high success rate, and smooth control in vision- based racing games. The proposed approach is validated in the simulation and on a real-world 1/10-scale race car, showing that the approach outperforms previous model-based and learning- based baselines.

Index terms

Machine Learning for Robot Control Motion Control Reinforcement Learning