Research Analyzer
← Back ICRA 2026

RefDiffMap: Diffusion-Guided Progressive Refinement for Vectorized HD Map Construction

Wenjie Gao, Entao Chang, Jiawei Fu, Ziyu Zhu, Shitao Chen, Nanning Zheng

PDF

AI summary

Key figure (auto-extracted from paper)
RefDiffMap leverages a diffusion-guided iterative refinement loop to dramatically improve HD map localization accuracy, particularly with lightweight backbones.
HD Map Construction Diffusion Models Vectorized Mapping Progressive Refinement Autonomous Driving BEV Perception

Problem

Single-pass transformer methods for vectorized HD map construction struggle to precisely localize map elements in large-scale BEV spaces, a limitation severely worsened by lightweight backbones with less distinctive features.

Approach

The method recasts map construction as a progressive denoising process, using a novel query generator that dynamically resamples BEV features based on intermediate noisy geometry to create a continuous geometry-feature co-evolution loop.

Key results

  • Achieves competitive performance on nuScenes and Argoverse 2 datasets
  • Boosts mAP by 11.3% over MapTRv2 baseline using a ResNet-18 backbone
  • Introduces a denoising query generator for dynamic geometry-feature alignment
  • Validates the iterative refinement loop as the primary driver of performance gains

Why it matters

Enables more accurate and robust online HD map construction for autonomous driving, especially in resource-constrained scenarios.

Abstract

High-definition (HD) map learning serves as an essential component of autonomous driving scene understanding, providing structured priors for planning and prediction. Recent transformer-based methods regress vectorized map elements via deformable attention over Bird’s-Eye View (BEV) features. They typically employ a single-pass paradigm, starting from a set of initial queries. However, these queries struggle to precisely localize map elements within the large-scale BEV space. This difficulty is severely amplified when using lightweight backbones that produce less distinctive features. To address this, we propose RefDiffMap, which recasts map construction as a progressive refinement process driven by a diffusion model. We introduce a novel denoising query generator that, at each step, leverages the intermediate noisy geometry to sample relevant features from adaptive BEV RoIs. These features are distilled into context- aware queries that guide the decoder’s next refinement. This creates a powerful geometry-feature co-evolution loop, allowing the model to iteratively correct localization errors. Compre- hensive experiments show that RefDiffMap achieves competitive performance on the nuScenes and Argoverse 2 datasets. Notably, its robustness is highlighted with a ResNet-18 backbone, where it improves mAP by a significant 11.3% over our baseline MapTRv2. Further ablation studies validate the effectiveness of our approach.

Index terms

Semantic Scene Understanding Mapping Computer Vision for Transportation

Related papers