← Back ICRA 2026

CoBEVMoE: Heterogeneity-Aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception

Lingzhao Kong, Jiacheng Lin, Siyu Li, Kai Luo, Zhiyong Li, Kailun Yang

PDF

AI summary

Key figure (auto-extracted from paper)

Dynamically generating agent-specific experts for feature fusion significantly improves multi-agent collaborative perception accuracy by preserving unique perceptual cues.

Collaborative perception Dynamic Mixture-of-Experts Heterogeneous feature fusion Bird’s Eye View Multi-agent systems Autonomous driving

Problem

Existing intermediate fusion methods in collaborative perception primarily align similar features, overlooking the perceptual diversity and heterogeneous observations caused by agents' differing viewpoints and spatial positions.

Approach

CoBEVMoE operates in BEV space and introduces a Dynamic Mixture-of-Experts (DMoE) module that generates agent-conditioned expert kernels, paired with a Dynamic Expert Metric Loss (DEML) to enforce diversity while fusing shared semantics and unique cues.

Key results

Achieves state-of-the-art performance on OPV2V and DAIR-V2X-C benchmarks
Improves camera-based BEV segmentation IoU by +1.5% on OPV2V
Boosts LiDAR-based 3D object detection AP@0.5 by +3.0% on DAIR-V2X-C
Outperforms strong baselines like CoBEVT and AttFuse across both modalities

Why it matters

Enables more robust and accurate environmental awareness for multi-agent autonomous driving systems, particularly under partial occlusions and sensor limitations.

Abstract

Collaborative perception aims to extend sensing coverage and improve perception accuracy by sharing infor- mation among multiple agents. However, due to differences in viewpoints and spatial positions, agents often acquire het- erogeneous observations. Existing intermediate fusion methods primarily focus on aligning similar features, often overlooking the perceptual diversity among agents. To address this limi- tation, we propose CoBEVMoE, a novel collaborative percep- tion framework that operates in the Bird’s Eye View (BEV) space and incorporates a Dynamic Mixture-of-Experts (DMoE) architecture. In DMoE, each expert is dynamically generated based on the input features of a specific agent, enabling it to extract distinctive and reliable cues while attending to shared semantics. This design allows the fusion process to explicitly model both feature similarity and heterogeneity across agents. Furthermore, we introduce a Dynamic Expert Metric Loss (DEML) to enhance inter-expert diversity and improve the dis- criminability of the fused representation. Extensive experiments on the OPV2V and DAIR-V2X-C datasets demonstrate that CoBEVMoE achieves state-of-the-art performance. Specifically, it improves the IoU for Camera-based BEV segmentation by +1.5% on OPV2V and the AP@0.5 for LiDAR-based 3D object detection by +3.0% on DAIR-V2X-C, verifying the effectiveness of expert-based heterogeneous feature modeling in multi-agent collaborative perception. The source code will be made publicly available at https://github.com/godk0509/CoBEVMoE.

Index terms

Computer Vision for Transportation Deep Learning for Visual Perception Multi-Robot Systems