Unveiling the Surprising Efficacy of Navigation Understanding in End-To-End Autonomous Driving
Zhihua Hua, junli Wang, Pengfei Li, Qihao Jin, Bo Zhang, Kehua Sheng, Yilun Chen, Zhongxue Gan, Wenchao Ding
AI summary
Problem
Existing end-to-end autonomous driving systems over-rely on local scene understanding and fail to effectively utilize global navigation information, often due to oversimplified one-hot driving commands that cause intent ambiguity and causal confusion in complex scenarios.
Approach
The authors propose Sequential Navigation Guidance (SNG), which replaces simple driving commands with detailed navigation paths and real-time turn-by-turn instructions, and introduce the SNG-VLA model and SNG-QA dataset to fuse this structured global navigation with local perception for planning.
Key results
- Proposes Sequential Navigation Guidance (SNG) to structure global navigation into paths and turn-by-turn cues
- Introduces SNG-QA dataset aligning global and local planning reasoning
- Develops SNG-VLA model achieving state-of-the-art performance on NAVSIM and Bench2Drive benchmarks
- Demonstrates that precise navigation modeling improves planning without auxiliary perception losses
Why it matters
It reveals a critical flaw in current end-to-end planners and provides a plug-and-play navigation representation that significantly improves planning rationality, safety, and real-world deployment readiness for autonomous vehicles.
Abstract
Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over- rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenar- ios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient represen- tation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG- VLA that fuses local planning with global planning. The SNG- VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA