M3CAD: Towards Generic Cooperative Autonomous Driving Benchmark
Morui Zhu, Yongqi Zhu, Yihao Zhu, Qi Chen, Deyuan Qu, Song Fu, Qing Yang
AI summary
Problem
Existing cooperative driving datasets are limited in scale, task diversity, and real-world applicability, while current perception methods rely on dense feature fusion that incurs prohibitive communication costs for real-world deployment.
Approach
The authors release M3CAD, a large-scale simulated benchmark supporting multi-vehicle, multi-task, and multi-modality cooperative driving research, alongside a multi-level fusion framework that dynamically selects between dense BEV features, compact queries, and sparse reference points based on available network bandwidth.
Key results
- M3CAD benchmark with 204 sequences, 30k frames, and annotations for six core autonomous driving tasks
- Multi-level fusion framework adaptively balancing communication efficiency and perception accuracy
- Reference point fusion reduces bandwidth by over 99% while maintaining near-optimal tracking and planning accuracy
- Sim-to-real transfer validation showing M3CAD pre-training boosts real-world performance with only 10% of nuScenes data
Why it matters
Offers the research community a scalable, realistic platform to develop and evaluate bandwidth-efficient cooperative autonomous driving systems that bridge the sim-to-real gap.
Abstract
We introduce M3CAD, a comprehensive bench- mark designed to advance research in generic cooperative autonomous driving. M3CAD comprises 204 sequences with 30,000 frames. Each sequence includes data from multiple vehicles and different types of sensors, e.g., LiDAR point clouds, RGB images, and GPS/IMU, supporting a variety of autonomous driving tasks, including object detection and tracking, mapping, motion forecasting, occupancy prediction, and path planning. This rich multimodal setup enables M3CAD to support both single-vehicle and multi-vehicle cooperative autonomous driving research. To the best of our knowledge, M3CAD is the most complete benchmark specifically designed for cooperative, multi-task autonomous driving research. To test its effectiveness, we use M3CAD to evaluate both state-of- the-art single-vehicle and cooperative driving solutions, setting baseline performance results. Since most existing cooperative perception methods focus on merging features but often ignore network bandwidth requirements, we propose a new multi-level fusion approach which adaptively balances communication ef- ficiency and perception accuracy based on the current network conditions. We release M3CAD, along with the baseline models and evaluation results, to support the development of robust cooperative autonomous driving systems. All resources will be made publicly available on our project webpage.