← Back ICRA 2026

V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts

Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, Stephen F. Smith

PDF

AI summary

Key figure (auto-extracted from paper)

A graph-of-thoughts reasoning framework for multimodal LLMs significantly improves cooperative perception, prediction, and planning in vehicle-to-vehicle autonomous driving by effectively handling sensor occlusion.

Vehicle-to-Vehicle Cooperation Multimodal LLMs Graph-of-Thoughts Autonomous Driving Cooperative Perception Sensor Occlusion

Problem

Autonomous vehicles face safety risks when local sensors are occluded by large objects, and existing multimodal LLM-based cooperative driving systems lack advanced reasoning mechanisms to effectively fuse shared perception and planning data.

Approach

The authors introduce V2V-GoT, a graph-of-thoughts framework that connects specialized perception and prediction questions in a directed reasoning graph, allowing the model to leverage occlusion-aware and planning-aware contexts for better decision-making.

Key results

Novel graph-of-thoughts reasoning framework for MLLM-based cooperative driving
Curation of V2V-GoT-QA dataset with 9 specialized QA types for perception, prediction, and planning
Development of V2V-GoT model integrating temporal perception features across timesteps
Superior performance over baselines with reduced collision rates and lower L2 prediction/planning errors

Why it matters

Provides a scalable, reasoning-driven approach to overcome sensor occlusion in cooperative autonomous driving, advancing safety and reliability for connected vehicle systems.

Abstract

Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of- thoughts framework specifically designed for MLLM-based co- operative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Our code and dataset are released to facilitate open-source research at https://eddyhkchiu.github.io/v2vgot.github.io/.

Index terms

Computer Vision for Transportation Intelligent Transportation Systems Deep Learning for Visual Perception