← Back ICRA 2026

AdaThinkDrive: Adaptive Thinking Via Reinforcement Learning for Autonomous Driving

yuechen luo, Fang LI, Shaoqing Xu, Zhiyi Lai, Lei Yang, Qimao Chen, Ziang Luo, Zixun Xie, Shengyin Jiang, Jiaxin Liu, Long Chen, Bing Wang, Zhi-Xin Yang

PDF

AI summary

Key figure (auto-extracted from paper)

Adaptively switching between fast direct prediction and slow Chain-of-Thought reasoning based on scene complexity improves both accuracy and efficiency in autonomous driving.

Adaptive reasoning Vision-Language-Action models Chain-of-Thought Reinforcement learning Autonomous driving Inference efficiency

Problem

Current Chain-of-Thought reasoning in autonomous driving models often over-reasons in simple scenarios, adding unnecessary computational overhead without improving decision quality. The paper addresses how to dynamically select the appropriate reasoning mode based on scene complexity.

Approach

AdaThinkDrive employs a dual-mode Vision-Language-Action framework trained with supervised fine-tuning on both reasoning and direct-response data, then optimized via reinforcement learning with a novel Adaptive Think Reward to dynamically trigger Chain-of-Thought only when needed.

Key results

Achieves a PDMS of 90.3 on Navsim, surpassing the best vision-only baseline by 1.7 points
Outperforms never-think and always-think baselines, improving PDMS by 2.0 and 1.4 respectively
Reduces inference time by 14% compared to the always-think baseline
Selectively applies CoT in 96% of challenging scenarios while defaulting to direct prediction in 84% of simple scenarios

Why it matters

Demonstrates how adaptive reasoning can balance high planning accuracy with computational efficiency, offering a practical blueprint for deploying real-time autonomous driving systems.

Abstract

While reasoning technology like Chain-of-Thought (CoT) has been widely adopted in Vision-Language-Action (VLA) models, it demonstrates promising capabilities in end- to-end autonomous driving. However, recent efforts to integrate CoT reasoning often fall short in simple scenarios, introducing unnecessary computational overhead without improving deci- sion quality. To address this, we propose AdaThinkDrive, a novel VLA framework with a dual-mode reasoning mechanism inspired by fast and slow thinking. First, our framework is pretrained on large-scale autonomous driving (AD) scenarios using both question-answering (QA) and trajectory datasets to acquire world knowledge and driving commonsense. Dur- ing supervised fine-tuning (SFT), we introduce a two-mode dataset—fast answering (w/o CoT) and slow thinking (with CoT), enabling the model to distinguish between scenarios that require reasoning. Furthermore, an Adaptive Think Re- ward strategy is proposed in conjunction with the Group Relative Policy Optimization (GRPO), which rewards the model for selectively applying CoT by comparing trajectory quality across different reasoning modes. Extensive experiments on the Navsim benchmark show that AdaThinkDrive achieves a PDMS of 90.3, surpassing the best vision-only baseline by 1.7 points. Moreover, ablations show that AdaThinkDrive surpasses both the never-Think and always-Think baselines, improving PDMS by 2.0 and 1.4, respectively. It also reduces inference time by 14% compared to the always-Think baseline, demonstrating its ability to balance accuracy and efficiency through adaptive reasoning.

Index terms

Autonomous Vehicle Navigation Deep Learning for Visual Perception Computer Vision for Transportation