SEF-MAP: Subspace-Decomposed Expert Fusion for Robust Multimodal HD Map Prediction
Haoxiang Fu, Lingfeng Zhang, Hao Li, Ruibing Hu, Zhengrong Li, Guanjing Liu, Zimu Tan, Long Chen, Hangjun Ye, Xiaoshuai Hao
AI summary
Problem
Current multimodal fusion methods for HD map prediction struggle with modality misalignment and performance degradation when one sensor is degraded by noise, occlusion, or poor lighting.
Approach
The framework decomposes bird's-eye-view features into four semantic subspaces, assigns each to a specialized expert, and uses an uncertainty-aware gating mechanism to dynamically weight expert outputs based on predictive variance.
Key results
- Explicitly disentangles multimodal BEV features into four semantic subspaces to mitigate cross-modal misalignment.
- Introduces distribution-aware masking with specialization losses to enforce expert roles and improve unimodal robustness.
- Designs an uncertainty-aware gating mechanism with balance regularizers for adaptive expert selection and collapse prevention.
- Achieves state-of-the-art performance with +4.2% mAP on nuScenes and +4.8% mAP on Argoverse2.
Why it matters
Provides autonomous driving systems with a robust, interpretable fusion framework that maintains high map prediction accuracy even when camera or LiDAR inputs are partially degraded or missing.
Abstract
High-definition (HD) maps are essential for au- tonomous driving, yet multi-modal fusion often suffers from inconsistency between camera and LiDAR modalities, leading to performance degradation under low-light conditions, occlu- sions, or sparse point clouds. To address this, we propose SEF- MAP, a Subspace-Expert Fusion framework for robust multi- modal HD map prediction. The key idea is to explicitly disentan- gle BEV features into four semantic subspaces: LiDAR-private, Image-private, Shared, and Interaction. Each subspace is as- signed a dedicated expert, thereby preserving modality-specific cues while capturing cross-modal consensus. To adaptively com- bine expert outputs, we introduce an uncertainty-aware gating mechanism at the BEV-cell level, where unreliable experts are down-weighted based on predictive variance, complemented by a usage balance regularizer to prevent expert collapse. To enhance robustness in degraded conditions and promote role specialization, we further propose distribution-aware masking: during training, modality-drop scenarios are simulated using EMA-statistical surrogate features, and a specialization loss enforces distinct behaviors of private, shared, and interaction experts across complete and masked inputs. Experiments on nuScenes and Argoverse2 benchmarks demonstrate that SEF- MAP achieves state-of-the-art performance, surpassing prior methods by +4.2% and +4.8% in mAP, respectively. SEF-MAP provides a robust and effective solution for multi-modal HD map prediction under diverse and degraded conditions.