← Back ICRA 2026

Unified Map Prior Encoder for Mapping and Planning

Zongzheng Zhang, Sizhe Zou, Guantian Zheng, Zhenxin Zhu, Yu Gao, Guoxuan Chi, Shuo Wang, Yuwen Heng, Zhigang Sun, Yiru Wang, HAO SUN, Chao Ma, ZHEN LI, Anqing Jiang, Hao Zhao

PDF

AI summary

Key figure (auto-extracted from paper)

A single unified encoder seamlessly fuses any combination of heterogeneous map priors to significantly boost both online mapping accuracy and end-to-end planning safety without retraining.

Unified Map Prior Encoder BEV Fusion Online Mapping End-to-End Planning Heterogeneous Priors Autonomous Driving

Problem

Autonomous driving pipelines typically rely on sensor-centric features and struggle to combine heterogeneous map priors due to pose drift, modality gaps, and inconsistent availability at test time.

Approach

UMPE uses parallel vector and raster branches with frame-wise SE(2) alignment, confidence-biased cross-attention, and zero-initialized residual fusion to dynamically integrate any subset of map priors into BEV features.

Key results

+5.9 mAP gain on nuScenes mapping with MapTRv2 backbone
+5.3 mAP improvement on nuScenes mapping with MapQR backbone
+4.1 mAP boost over strong baselines on Argoverse2
Reduces E2E planning trajectory error by 0.30 m and collision rate by 0.10%

Why it matters

Enables robust, sensor-agnostic autonomous driving systems that gracefully adapt to real-world map data availability without per-scenario retraining.

Abstract

Online mapping and end-to-end (E2E) planning in autonomous driving are still largely sensor-centric, leaving rich map priors (HD/SD vector maps, rasterized SD maps, and satellite imagery) underused due to heterogeneity, pose drift, and inconsistent availability at test time. We present UMPE, a Unified Map Prior Encoder that can ingest any subset of four priors and fuse them with BEV features for both mapping and planning. UMPE has two branches. The vector encoder pre-aligns HD/SD polylines with a frame-wise SE(2) correction, encodes points via multi-frequency sinusoidal features, and produces polyline tokens with confidence scores. BEV queries then apply cross-attention with confidence bias, followed by normalized channel-wise gating to avoid length im- balance and to softly down-weight uncertain sources. The raster encoder shares a ResNet-18 backbone conditioned by FiLM (scaling/shift at every stage), performs SE(2) micro-alignment, and injects priors through zero-initialized residual fusion so the network starts from a do-no-harm baseline and learns to add only useful prior evidence. A vector-then-raster fusion order reflects the inductive bias of “geometry first, appearance second.” On nuScenes mapping, UMPE lifts MapTRv2 from 61.5 →67.4 mAP (+5.9) and MapQR from 66.4 →71.7 mAP (+5.3). On Argoverse2, UMPE adds +4.1 mAP over strong baselines. UMPE is compositional: when trained with all priors, it outperforms single-prior models even when only one prior is available at test time, demonstrating powerset robustness. For E2E planning (VAD backbone, nuScenes), UMPE reduces trajectory error from 0.72 →0.42 m L2 (avg. −0.30 m) and collision rate from 0.22% →0.12% (−0.10%), surpassing recent prior-injection methods. These results show that a unified, alignment-aware treatment of heterogeneous map priors yields better mapping and better planning. Code and dataset are re- leased at https://github.com/Ethan-Zheng136/UMPE

Index terms

Reactive and Sensor-Based Planning Semantic Scene Understanding Intelligent Transportation Systems