CoBEVMoE: Heterogeneity-Aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception
Lingzhao Kong, Jiacheng Lin, Siyu Li, Kai Luo, Zhiyong Li, Kailun Yang
AI summary
Problem
Existing intermediate fusion methods in collaborative perception primarily align similar features, overlooking the perceptual diversity and heterogeneous observations caused by agents' differing viewpoints and spatial positions.
Approach
CoBEVMoE operates in BEV space and introduces a Dynamic Mixture-of-Experts (DMoE) module that generates agent-conditioned expert kernels, paired with a Dynamic Expert Metric Loss (DEML) to enforce diversity while fusing shared semantics and unique cues.
Key results
- Achieves state-of-the-art performance on OPV2V and DAIR-V2X-C benchmarks
- Improves camera-based BEV segmentation IoU by +1.5% on OPV2V
- Boosts LiDAR-based 3D object detection AP@0.5 by +3.0% on DAIR-V2X-C
- Outperforms strong baselines like CoBEVT and AttFuse across both modalities
Why it matters
Enables more robust and accurate environmental awareness for multi-agent autonomous driving systems, particularly under partial occlusions and sensor limitations.
Abstract
Collaborative perception aims to extend sensing coverage and improve perception accuracy by sharing infor- mation among multiple agents. However, due to differences in viewpoints and spatial positions, agents often acquire het- erogeneous observations. Existing intermediate fusion methods primarily focus on aligning similar features, often overlooking the perceptual diversity among agents. To address this limi- tation, we propose CoBEVMoE, a novel collaborative percep- tion framework that operates in the Bird’s Eye View (BEV) space and incorporates a Dynamic Mixture-of-Experts (DMoE) architecture. In DMoE, each expert is dynamically generated based on the input features of a specific agent, enabling it to extract distinctive and reliable cues while attending to shared semantics. This design allows the fusion process to explicitly model both feature similarity and heterogeneity across agents. Furthermore, we introduce a Dynamic Expert Metric Loss (DEML) to enhance inter-expert diversity and improve the dis- criminability of the fused representation. Extensive experiments on the OPV2V and DAIR-V2X-C datasets demonstrate that CoBEVMoE achieves state-of-the-art performance. Specifically, it improves the IoU for Camera-based BEV segmentation by +1.5% on OPV2V and the AP@0.5 for LiDAR-based 3D object detection by +3.0% on DAIR-V2X-C, verifying the effectiveness of expert-based heterogeneous feature modeling in multi-agent collaborative perception. The source code will be made publicly available at https://github.com/godk0509/CoBEVMoE.