← Back ICRA 2026

DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving

Xinmeng Hou, Wuqi Wang, long yang, Hao Lin, Jinglun Feng, Haigen Min, Xiangmo Zhao

PDF

AI summary

Key figure (auto-extracted from paper)

DriveAgent leverages a modular multi-agent framework and LLM-driven reasoning to fuse multimodal sensor data, significantly boosting vehicle and environmental reasoning for autonomous driving.

Autonomous Driving Large Language Models Multi-Agent Systems Multimodal Sensor Fusion Structured Reasoning Causal Analysis

Problem

Existing autonomous driving architectures struggle to integrate heterogeneous sensor modalities and lack interpretability, particularly in dynamic or ambiguous driving scenarios.

Approach

The framework orchestrates specialized agents for descriptive analysis, vehicle and environmental reasoning, and decision generation, using LLMs to align and interpret camera, LiDAR, IMU, and GPS inputs.

Key results

26.31% improvement in vehicle reasoning over baselines
Up to 2.85% enhancement in environmental reasoning
Introduction of a three-tier driving dataset for comprehensive evaluation
Fine-tuned vision-language model for robust object detection and traffic interpretation

Why it matters

It advances the reliability and interpretability of autonomous driving systems by demonstrating how LLM-driven multi-agent fusion can effectively handle complex, real-world sensor data.

Abstract

We introduce DriveAgent, a modular multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion for autonomous driving. DriveAgent orchestrates special- ized agents operating on camera, Light Detection and Ranging (LiDAR), Inertial Measurement Unit (IMU), and Global Po- sitioning System (GPS) with LLM-driven analytical processes to deliver temporally aligned perception, causal reasoning, and action recommendations. The framework operates through a modular agent-based pipeline comprising four principal mod- ules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle- level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized per- ception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments demonstrate that DriveAgent substantially out- performs baseline methods, achieving a 26.31% improvement in vehicle reasoning and consistent enhancements of up to 2.85% in environmental reasoning. These results highlight the effectiveness of our LLM-driven multi-agent sensor fusion framework in boosting the robustness and reliability of au- tonomous driving systems. 1

Index terms

Agent-Based Systems AI-Based Methods Autonomous Agents