A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X-Enabled Autonomous Driving
Hanlin Wu, Pengfei Lin, Ehsan Javanmardi, Naren Bao, Bo Qian, Hao Si, Manabu Tsukada
AI summary
Problem
Single-vehicle 3D semantic occupancy prediction is constrained by occlusions and limited sensor range, while existing datasets lack dense, voxel-level annotations for multi-agent V2X scenarios.
Approach
The authors introduce a high-resolution synthetic dataset in CARLA with dense voxel annotations and propose a baseline model that fuses multi-agent features via spatial alignment and confidence-guided attention.
Key results
- Co3SOP dataset with high-resolution, dense 3D semantic voxel annotations
- Collaborative baseline model using spatial alignment and confidence-guided attention
- Multi-range benchmarks showing consistent performance gains over single-agent methods
- Scaling prediction accuracy with expanded collaboration range under pose noise
Why it matters
Provides a critical foundation for advancing fine-grained, multi-agent scene understanding in autonomous driving research and simulation.
Abstract
3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving, providing a voxel- level representation of both geometric details and semantic cate- gories. However, despite its fine-grained scene understanding, its effectiveness is inherently constrained in single-vehicle setups by occlusions, restricted sensor range, and narrow viewpoints. To address these limitations, collaborative perception enables the exchange of complementary information, thereby enhancing the completeness and accuracy of predictions. Despite its potential, research on collaborative 3D semantic occupancy prediction is hindered by the lack of dedicated datasets. To bridge this gap, we design a high-resolution semantic voxel sensor in CARLA to produce dense and comprehensive annotations for V2X scenarios. We further develop a baseline model that performs inter-agent feature fusion via spatial alignment and attention aggregation. In addition, we establish benchmarks with varying prediction ranges designed to systematically assess the impact of spatial extent on collaborative prediction. Experimental results demonstrate the superior performance of our baseline enabled by vehicle collaboration, with increasing gains observed as the prediction range expands. Our codes and data are available at https://github.com/tlab-wide/Co3SOP.