← Back ICRA 2026

RefDiffMap: Diffusion-Guided Progressive Refinement for Vectorized HD Map Construction

Wenjie Gao, Entao Chang, Jiawei Fu, Ziyu Zhu, Shitao Chen, Nanning Zheng

PDF

AI summary

Key figure (auto-extracted from paper)

RefDiffMap leverages a diffusion-guided iterative refinement loop to dramatically improve HD map localization accuracy, particularly with lightweight backbones.

HD Map Construction Diffusion Models Vectorized Mapping Progressive Refinement Autonomous Driving BEV Perception

Problem

Single-pass transformer methods for vectorized HD map construction struggle to precisely localize map elements in large-scale BEV spaces, a limitation severely worsened by lightweight backbones with less distinctive features.

Approach

The method recasts map construction as a progressive denoising process, using a novel query generator that dynamically resamples BEV features based on intermediate noisy geometry to create a continuous geometry-feature co-evolution loop.

Key results

Achieves competitive performance on nuScenes and Argoverse 2 datasets
Boosts mAP by 11.3% over MapTRv2 baseline using a ResNet-18 backbone
Introduces a denoising query generator for dynamic geometry-feature alignment
Validates the iterative refinement loop as the primary driver of performance gains

Why it matters

Enables more accurate and robust online HD map construction for autonomous driving, especially in resource-constrained scenarios.

Abstract

High-definition (HD) map learning serves as an essential component of autonomous driving scene understanding, providing structured priors for planning and prediction. Recent transformer-based methods regress vectorized map elements via deformable attention over Bird’s-Eye View (BEV) features. They typically employ a single-pass paradigm, starting from a set of initial queries. However, these queries struggle to precisely localize map elements within the large-scale BEV space. This difficulty is severely amplified when using lightweight backbones that produce less distinctive features. To address this, we propose RefDiffMap, which recasts map construction as a progressive refinement process driven by a diffusion model. We introduce a novel denoising query generator that, at each step, leverages the intermediate noisy geometry to sample relevant features from adaptive BEV RoIs. These features are distilled into context- aware queries that guide the decoder’s next refinement. This creates a powerful geometry-feature co-evolution loop, allowing the model to iteratively correct localization errors. Compre- hensive experiments show that RefDiffMap achieves competitive performance on the nuScenes and Argoverse 2 datasets. Notably, its robustness is highlighted with a ResNet-18 backbone, where it improves mAP by a significant 11.3% over our baseline MapTRv2. Further ablation studies validate the effectiveness of our approach.

Index terms

Semantic Scene Understanding Mapping Computer Vision for Transportation