Research Analyzer
← Back ICRA 2026

Unveiling the Surprising Efficacy of Navigation Understanding in End-To-End Autonomous Driving

Zhihua Hua, junli Wang, Pengfei Li, Qihao Jin, Bo Zhang, Kehua Sheng, Yilun Chen, Zhongxue Gan, Wenchao Ding

PDF

AI summary

Key figure (auto-extracted from paper)
Current end-to-end driving models largely ignore global navigation commands, but structuring navigation as sequential paths and turn-by-turn cues dramatically improves planning performance and achieves state-of-the-art results.
End-to-end autonomous driving Global navigation Turn-by-turn guidance Vision-language-action models Planning reasoning SNG-VLA

Problem

Existing end-to-end autonomous driving systems over-rely on local scene understanding and fail to effectively utilize global navigation information, often due to oversimplified one-hot driving commands that cause intent ambiguity and causal confusion in complex scenarios.

Approach

The authors propose Sequential Navigation Guidance (SNG), which replaces simple driving commands with detailed navigation paths and real-time turn-by-turn instructions, and introduce the SNG-VLA model and SNG-QA dataset to fuse this structured global navigation with local perception for planning.

Key results

  • Proposes Sequential Navigation Guidance (SNG) to structure global navigation into paths and turn-by-turn cues
  • Introduces SNG-QA dataset aligning global and local planning reasoning
  • Develops SNG-VLA model achieving state-of-the-art performance on NAVSIM and Bench2Drive benchmarks
  • Demonstrates that precise navigation modeling improves planning without auxiliary perception losses

Why it matters

It reveals a critical flaw in current end-to-end planners and provides a plug-and-play navigation representation that significantly improves planning rationality, safety, and real-world deployment readiness for autonomous vehicles.

Abstract

Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over- rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenar- ios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient represen- tation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG- VLA that fuses local planning with global planning. The SNG- VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA

Index terms

Autonomous Vehicle Navigation Motion and Path Planning Intelligent Transportation Systems

Related papers