RelMap: Enhancing Online Map Construction with Class-Aware Spatial Relation and Semantic Priors
Tianhui Cai, Yun Zhang, Zewei Zhou, Zhiyu Huang, Jiaqi Ma
AI summary
Problem
Current Transformer-based online HD map construction methods treat map elements independently, ignoring crucial spatial and semantic relationships that limit prediction accuracy and generalization.
Approach
RelMap integrates a Class-aware Spatial Relation Prior to encode geometric dependencies between instances and a Mixture-of-Experts Semantic Prior that dynamically routes features to class-specific experts for refined decoding.
Key results
- State-of-the-art performance on nuScenes and Argoverse 2 benchmarks
- Seamless compatibility with single-frame and temporal perception backbones
- Improved vectorized map prediction accuracy across lane dividers, crosswalks, and road boundaries
- Elimination of separate routing networks in MoE, reducing model complexity and training overhead
Why it matters
Enables more scalable and accurate real-time HD map generation for autonomous vehicles by leveraging intrinsic map topology and semantics.
Abstract
Online high-definition (HD) map construction is crucial for scaling autonomous driving systems. While Transformer-based methods have become prevalent in online HD map construction, most existing approaches overlook the inherent spatial dependencies and semantic relationships be- tween map elements, which constrains their accuracy and generalization capabilities. To address this, we propose RelMap, an end-to-end framework that explicitly models both spatial relations and semantic priors to enhance online HD map construction. Specifically, we introduce a Class-aware Spatial Relation Prior, which explicitly encodes relative positional de- pendencies between map elements using a learnable class-aware relation encoder. Additionally, we design a Mixture-of-Experts- based Semantic Prior, which routes features to class-specific experts based on predicted class probabilities, refining instance feature decoding. RelMap is compatible with both single-frame and temporal perception backbones, achieving state-of-the-art performance on the nuScenes and Argoverse 2 datasets.