← Back ICRA 2026

4D Radar Diffusion with Adaptive Visual-Aided Condition for Point Cloud Enhancement

Renxiang Xiao, Yuanfan Zhang, Wei Liu, Guangzhong Dong, Yunjiang Lou, Liang Hu

PDF

AI summary

Key figure (auto-extracted from paper)

Fusing synchronized camera and 4D radar data into a conditional latent diffusion model generates dense, geometrically consistent point clouds that outperform existing methods, even under visual occlusions.

4D Radar Diffusion Models Point Cloud Enhancement Vision-Radar Fusion Autonomous Driving Sensor Fusion

Problem

Raw 4D radar point clouds are inherently sparse and noisy, limiting autonomous driving perception. Current diffusion-based enhancement methods suffer from a sensor-physic mismatch because they rely on LiDAR for training but only use radar at inference, leading to incomplete geometry under occlusions.

Approach

The method decomposes radar data into depth and bird’s-eye-view features, fuses them with synchronized camera images via an adaptive attention module, and conditions a latent diffusion model to denoise and densify the point cloud without requiring LiDAR during inference.

Key results

State-of-the-art 2D enhancement metrics on the VoD dataset
Recovery of complete object contours under visual occlusions
Accurate odometry and high-fidelity map reconstruction from enhanced radar
Robust performance in adverse weather by adaptively switching between visual and radar inputs

Why it matters

It enables reliable, high-quality perception for autonomous vehicles and robots in adverse weather and occluded scenarios where LiDAR fails.

Abstract

Despite its resilience in adverse weather, millimeter-wave (mmWave) radar yields sparse and noisy point clouds that limit its perception and localization performance. Diffusion models have recently gained attention for enhancing millimeter-wave radar in perception tasks due to their strong denoising and generative capabilities. Yet, the enhanced radar point cloud is still far from expected due to a lack of texture information and errors caused by inherent sensor–model mismatch between LiDAR and radar. In this paper, we propose an adaptive vision-aided radar data enhancement method based on a conditional diffusion model for denoising and densifying radar point clouds. The pipeline decomposes mmWave radar into depth and BEV views, fuses the depth view with synchronized images, and uses the fused features together with BEV tokens to condition the diffusion model. LiDAR is used only for training supervision, but not for inference. Extensive experiments demonstrate that our proposed method produces dense and geometrically consistent radar point clouds, validating the effectiveness of the introduced vision-aid for radar enhancement. Notably, our method even works well in scenarios under visual occlusions. The accurate odometry and high-fidelity map reconstruction using enhanced radar point cloud highlights the great potential of our method for other downstream tasks in robotics and autonomous driving.

Index terms

Deep Learning Methods Range Sensing