DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving
Xinmeng Hou, Wuqi Wang, long yang, Hao Lin, Jinglun Feng, Haigen Min, Xiangmo Zhao
AI summary
Problem
Existing autonomous driving architectures struggle to integrate heterogeneous sensor modalities and lack interpretability, particularly in dynamic or ambiguous driving scenarios.
Approach
The framework orchestrates specialized agents for descriptive analysis, vehicle and environmental reasoning, and decision generation, using LLMs to align and interpret camera, LiDAR, IMU, and GPS inputs.
Key results
- 26.31% improvement in vehicle reasoning over baselines
- Up to 2.85% enhancement in environmental reasoning
- Introduction of a three-tier driving dataset for comprehensive evaluation
- Fine-tuned vision-language model for robust object detection and traffic interpretation
Why it matters
It advances the reliability and interpretability of autonomous driving systems by demonstrating how LLM-driven multi-agent fusion can effectively handle complex, real-world sensor data.
Abstract
We introduce DriveAgent, a modular multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion for autonomous driving. DriveAgent orchestrates special- ized agents operating on camera, Light Detection and Ranging (LiDAR), Inertial Measurement Unit (IMU), and Global Po- sitioning System (GPS) with LLM-driven analytical processes to deliver temporally aligned perception, causal reasoning, and action recommendations. The framework operates through a modular agent-based pipeline comprising four principal mod- ules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle- level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized per- ception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments demonstrate that DriveAgent substantially out- performs baseline methods, achieving a 26.31% improvement in vehicle reasoning and consistent enhancements of up to 2.85% in environmental reasoning. These results highlight the effectiveness of our LLM-driven multi-agent sensor fusion framework in boosting the robustness and reliability of au- tonomous driving systems. 1