MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases
Ziang Luo, Kangan Qian, Jiahua Wang, Jinyu Miao, Zheng Fu, yuechen luo, Yunlong Wang, Sicong Jiang, Zilin Huang, Yifei HU, Yuhao Yang, Hao YE, Mengmeng Yang, Xiaojian Dong, Kun Jiang, Diange Yang
AI summary
Problem
Current Vision-Language Models for autonomous driving are fragile, prone to visual hallucinations, and fail to generalize in out-of-distribution or corner-case scenarios.
Approach
MTRDrive replaces static decision-making with a closed-loop system that proactively retrieves relevant past driving experiences and dynamically invokes vision tools to ground its reasoning in real-time perception.
Key results
- State-of-the-art planning accuracy (82.6%) and driving metric score (79.8%) on NAVSIM with a 3B-parameter model
- Strong zero-shot generalization on the new RoadWork-VLM benchmark (80.2% driving metric score)
- Effective mitigation of visual hallucinations and improved robustness in complex roadwork scenarios
- Novel two-stage training pipeline combining supervised fine-tuning with GRPO reinforcement learning
Why it matters
Provides a scalable, reliable pathway for deploying end-to-end autonomous driving systems in real-world, unpredictable environments.
Abstract
Vision-Language Models(VLMs) have demon- strated significant potential for end-to-end autonomous driving, yet a substantial gap remains between their current capabilities and the reliability necessary for real-world deployment. A critical challenge is their fragility, characterized by halluci- nations and poor generalization in out-of-distribution (OOD) scenarios. To bridge this gap, we introduce MTRDrive, a novel framework that integrates procedural driving experiences with a dynamic toolkit to enhance generalization and proactive decision-making. MTRDrive addresses these limitations through a closed-loop system that combines a memory-based experience retrieval mechanism with dynamic toolkits. This synergy enables the model to interact more effectively with its environment, im- proving both reasoning and decision-making capabilities with the help of our memory-tool synergistic reasoning. Additionally, we introduce a new benchmark based on complex Roadwork construction scenarios to rigorously evaluate zero-shot gener- alization. Extensive experiments demonstrate the superior effectiveness of our approach. On the public NAVSIM benchmark, our 3B- parameter MTRDrive model achieves an exceptional PDMS of 88.3 without chain-of-thought and sets a state-of-the-art performance bar on high-level planning, with a driving metric score of 79.8% and a planning accuracy of 82.6%. Rigorous zero-shot evaluation on the new Roadwork-VLM benchmark shows a strong ability to reason robustly in unseen scenarios, achieving a driving metric score of 80.2%. These results highlight MTRDrive’s potential to advance autonomous driving toward safer and more reliable systems.