Vision-Based Policy Learning for High-Speed Autonomous Racing
Haoran Xu, Xianwei Chen, Yilin Lang, Qinyuan Ren
AI summary
Problem
Classical modular racing systems suffer from computational inefficiency and error propagation, while existing end-to-end reinforcement learning methods struggle with high-dimensional visual data and lack global track information needed for optimal behavior.
Approach
The authors train a teacher policy using privileged racetrack data and reinforcement learning to generate optimal trajectories, then distill this knowledge into a vision-based student policy using a VAE for noise robustness and an RNN for temporal memory.
Key results
- High-speed driving with high success rate in simulation
- Zero-shot sim-to-real transfer on a 1/10-scale physical race car
- Outperforms model-based and learning-based baselines
- Robust control under noisy depth observations and partial observability
Why it matters
Enables practical deployment of high-performance autonomous racing agents using only local visual sensors, bridging the sim-to-real gap for dynamic vehicle control.
Abstract
Motion planning for autonomous vision-based car racing is a challenging task in robotics. Classical racing systems divide the task into numerous submodules, undermining compu- tational efficiency and leading to error propagation. Previous studies have demonstrated impressive reinforcement learning (RL) results for end-to-end autonomous driving. However, RL exhibits poor scalability on high-dimensional data, such as images, and it is challenging to learn optimal racing behaviors due to a lack of global information about the environments. To address these issues, a two-phase learning paradigm is proposed in this work to train a vision-based racing policy. First, RL trains a teacher policy that integrates progress maximization with collision avoidance in the reward function and utilizes privileged information about the racetrack to achieve high-performance racing. Then, a student policy, relying only on an ego-centric depth camera for perception, is trained by distilling racing knowledge from the teacher policy. The student policy achieves high-speed drive, high success rate, and smooth control in vision- based racing games. The proposed approach is validated in the simulation and on a real-world 1/10-scale race car, showing that the approach outperforms previous model-based and learning- based baselines.