V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, Stephen F. Smith
AI summary
Problem
Autonomous vehicles face safety risks when local sensors are occluded by large objects, and existing multimodal LLM-based cooperative driving systems lack advanced reasoning mechanisms to effectively fuse shared perception and planning data.
Approach
The authors introduce V2V-GoT, a graph-of-thoughts framework that connects specialized perception and prediction questions in a directed reasoning graph, allowing the model to leverage occlusion-aware and planning-aware contexts for better decision-making.
Key results
- Novel graph-of-thoughts reasoning framework for MLLM-based cooperative driving
- Curation of V2V-GoT-QA dataset with 9 specialized QA types for perception, prediction, and planning
- Development of V2V-GoT model integrating temporal perception features across timesteps
- Superior performance over baselines with reduced collision rates and lower L2 prediction/planning errors
Why it matters
Provides a scalable, reasoning-driven approach to overcome sensor occlusion in cooperative autonomous driving, advancing safety and reliability for connected vehicle systems.
Abstract
Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of- thoughts framework specifically designed for MLLM-based co- operative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Our code and dataset are released to facilitate open-source research at https://eddyhkchiu.github.io/v2vgot.github.io/.