4D Radar Diffusion with Adaptive Visual-Aided Condition for Point Cloud Enhancement
Renxiang Xiao, Yuanfan Zhang, Wei Liu, Guangzhong Dong, Yunjiang Lou, Liang Hu
AI summary
Problem
Raw 4D radar point clouds are inherently sparse and noisy, limiting autonomous driving perception. Current diffusion-based enhancement methods suffer from a sensor-physic mismatch because they rely on LiDAR for training but only use radar at inference, leading to incomplete geometry under occlusions.
Approach
The method decomposes radar data into depth and bird’s-eye-view features, fuses them with synchronized camera images via an adaptive attention module, and conditions a latent diffusion model to denoise and densify the point cloud without requiring LiDAR during inference.
Key results
- State-of-the-art 2D enhancement metrics on the VoD dataset
- Recovery of complete object contours under visual occlusions
- Accurate odometry and high-fidelity map reconstruction from enhanced radar
- Robust performance in adverse weather by adaptively switching between visual and radar inputs
Why it matters
It enables reliable, high-quality perception for autonomous vehicles and robots in adverse weather and occluded scenarios where LiDAR fails.
Abstract
Despite its resilience in adverse weather, millimeter-wave (mmWave) radar yields sparse and noisy point clouds that limit its perception and localization performance. Diffusion models have recently gained attention for enhancing millimeter-wave radar in perception tasks due to their strong denoising and generative capabilities. Yet, the enhanced radar point cloud is still far from expected due to a lack of texture information and errors caused by inherent sensor–model mismatch between LiDAR and radar. In this paper, we propose an adaptive vision-aided radar data enhancement method based on a conditional diffusion model for denoising and densifying radar point clouds. The pipeline decomposes mmWave radar into depth and BEV views, fuses the depth view with synchronized images, and uses the fused features together with BEV tokens to condition the diffusion model. LiDAR is used only for training supervision, but not for inference. Extensive experiments demonstrate that our proposed method produces dense and geometrically consistent radar point clouds, validating the effectiveness of the introduced vision-aid for radar enhancement. Notably, our method even works well in scenarios under visual occlusions. The accurate odometry and high-fidelity map reconstruction using enhanced radar point cloud highlights the great potential of our method for other downstream tasks in robotics and autonomous driving.