← Back ICRA 2026

Diffusion-Guided Generalizable Enhancer for Urban Scene Reconstruction

Henry Che, Jingkang Wang, Yun Chen, Ze Yang, Siva Manivasagam, Raquel Urtasun

PDF

AI summary

Key figure (auto-extracted from paper)

GenRe efficiently fixes 3D Gaussian reconstruction artifacts at novel viewpoints using a diffusion-guided enhancer, enabling robust urban scene simulation without costly per-scene optimization.

Urban scene reconstruction 3D Gaussian Splatting Diffusion models Neural fixer Autonomous driving simulation Generalizable enhancement

Problem

Standard 3D Gaussian Splatting methods for urban driving scenes overfit to recorded trajectories, causing severe visual degradation and artifacts under large viewpoint shifts, while existing diffusion-based fixers require hours of per-scene optimization and fail to generalize.

Approach

GenRe combines a one-step diffusion-based 2D neural fixer that corrects novel-view artifacts with geometry and appearance cues, followed by a generalizable 3D enhancer network that iteratively updates Gaussian parameters to distill these corrections into a consistent 3D representation.

Key results

Reduces reconstruction artifacts within minutes instead of hours
Maintains high fidelity under meter-scale lateral viewpoint shifts
Generalizes reliably to challenging extrapolations like lane changes
Improves downstream autonomous driving simulation and perception tasks

Why it matters

Provides a scalable, efficient pipeline for high-fidelity sensor simulation and closed-loop testing in autonomous driving development.

Abstract

Urban scene reconstruction from real-world obser- vations has emerged as a powerful tool for self-driving develop- ment and testing. While current neural rendering approaches achieve high-fidelity rendering along the recorded trajectories, their quality degrades significantly under large viewpoint shifts, limiting the applicability for closed-loop simulation. Recent works have shown promising results in using diffusion models to enhance quality at these challenging viewpoints and distill improvements back into 3D representations. However, they often require costly per-scene optimization, and the distilled representations remain fragile and fail to generalize beyond limited synthesized views. To address these limitations, we propose GenRe, a novel diffusion-guided generalizable enhancer for urban scene reconstruction. GenRe takes as input any pretrained 3D Gaussian representation and fixes the deficiencies within a few minutes. By learning to distill generative priors across diverse scenes, GenRe produces robust and high-fidelity representation efficiently that generalizes reliably to challenging unseen viewpoints (e.g., lane change). Experiments show that GenRe outperforms existing methods in both quality and efficiency and benefits various downstream tasks, enabling robust and scalable sensor simulation for autonomous driving.

Index terms

Computer Vision for Automation Autonomous Vehicle Navigation Simulation and Animation