RefDiffMap: Diffusion-Guided Progressive Refinement for Vectorized HD Map Construction
Wenjie Gao, Entao Chang, Jiawei Fu, Ziyu Zhu, Shitao Chen, Nanning Zheng
AI summary
Problem
Single-pass transformer methods for vectorized HD map construction struggle to precisely localize map elements in large-scale BEV spaces, a limitation severely worsened by lightweight backbones with less distinctive features.
Approach
The method recasts map construction as a progressive denoising process, using a novel query generator that dynamically resamples BEV features based on intermediate noisy geometry to create a continuous geometry-feature co-evolution loop.
Key results
- Achieves competitive performance on nuScenes and Argoverse 2 datasets
- Boosts mAP by 11.3% over MapTRv2 baseline using a ResNet-18 backbone
- Introduces a denoising query generator for dynamic geometry-feature alignment
- Validates the iterative refinement loop as the primary driver of performance gains
Why it matters
Enables more accurate and robust online HD map construction for autonomous driving, especially in resource-constrained scenarios.
Abstract
High-definition (HD) map learning serves as an essential component of autonomous driving scene understanding, providing structured priors for planning and prediction. Recent transformer-based methods regress vectorized map elements via deformable attention over Bird’s-Eye View (BEV) features. They typically employ a single-pass paradigm, starting from a set of initial queries. However, these queries struggle to precisely localize map elements within the large-scale BEV space. This difficulty is severely amplified when using lightweight backbones that produce less distinctive features. To address this, we propose RefDiffMap, which recasts map construction as a progressive refinement process driven by a diffusion model. We introduce a novel denoising query generator that, at each step, leverages the intermediate noisy geometry to sample relevant features from adaptive BEV RoIs. These features are distilled into context- aware queries that guide the decoder’s next refinement. This creates a powerful geometry-feature co-evolution loop, allowing the model to iteratively correct localization errors. Compre- hensive experiments show that RefDiffMap achieves competitive performance on the nuScenes and Argoverse 2 datasets. Notably, its robustness is highlighted with a ResNet-18 backbone, where it improves mAP by a significant 11.3% over our baseline MapTRv2. Further ablation studies validate the effectiveness of our approach.