Research Analyzer
← Back ICRA 2026

MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases

Ziang Luo, Kangan Qian, Jiahua Wang, Jinyu Miao, Zheng Fu, yuechen luo, Yunlong Wang, Sicong Jiang, Zilin Huang, Yifei HU, Yuhao Yang, Hao YE, Mengmeng Yang, Xiaojian Dong, Kun Jiang, Diange Yang

PDF

AI summary

Key figure (auto-extracted from paper)
MTRDrive significantly improves autonomous driving reliability in unseen scenarios by synergizing retrieved past experiences with dynamic tool use for interactive reasoning.
Autonomous Driving Vision-Language Models Memory-Tool Synergy Out-of-Distribution Generalization Interactive Reasoning Reinforcement Learning

Problem

Current Vision-Language Models for autonomous driving are fragile, prone to visual hallucinations, and fail to generalize in out-of-distribution or corner-case scenarios.

Approach

MTRDrive replaces static decision-making with a closed-loop system that proactively retrieves relevant past driving experiences and dynamically invokes vision tools to ground its reasoning in real-time perception.

Key results

  • State-of-the-art planning accuracy (82.6%) and driving metric score (79.8%) on NAVSIM with a 3B-parameter model
  • Strong zero-shot generalization on the new RoadWork-VLM benchmark (80.2% driving metric score)
  • Effective mitigation of visual hallucinations and improved robustness in complex roadwork scenarios
  • Novel two-stage training pipeline combining supervised fine-tuning with GRPO reinforcement learning

Why it matters

Provides a scalable, reliable pathway for deploying end-to-end autonomous driving systems in real-world, unpredictable environments.

Abstract

Vision-Language Models(VLMs) have demon- strated significant potential for end-to-end autonomous driving, yet a substantial gap remains between their current capabilities and the reliability necessary for real-world deployment. A critical challenge is their fragility, characterized by halluci- nations and poor generalization in out-of-distribution (OOD) scenarios. To bridge this gap, we introduce MTRDrive, a novel framework that integrates procedural driving experiences with a dynamic toolkit to enhance generalization and proactive decision-making. MTRDrive addresses these limitations through a closed-loop system that combines a memory-based experience retrieval mechanism with dynamic toolkits. This synergy enables the model to interact more effectively with its environment, im- proving both reasoning and decision-making capabilities with the help of our memory-tool synergistic reasoning. Additionally, we introduce a new benchmark based on complex Roadwork construction scenarios to rigorously evaluate zero-shot gener- alization. Extensive experiments demonstrate the superior effectiveness of our approach. On the public NAVSIM benchmark, our 3B- parameter MTRDrive model achieves an exceptional PDMS of 88.3 without chain-of-thought and sets a state-of-the-art performance bar on high-level planning, with a driving metric score of 79.8% and a planning accuracy of 82.6%. Rigorous zero-shot evaluation on the new Roadwork-VLM benchmark shows a strong ability to reason robustly in unseen scenarios, achieving a driving metric score of 80.2%. These results highlight MTRDrive’s potential to advance autonomous driving toward safer and more reliable systems.

Index terms

Autonomous Vehicle Navigation Automation Technologies for Smart Cities

Related papers