MSDNet: Efficient 4D Radar Super-Resolution Via Multi-Stage Distillation
Minqing Huang, Shouyi Lu, Boyuan Zheng, Ziyao Li, Xiao Tang, Guirong Zhuo
AI summary
Problem
4D radar point clouds are inherently sparse and noisy, limiting their use in fine-grained autonomous perception, while existing super-resolution methods suffer from high training costs, complex diffusion sampling, or poor generalization.
Approach
The method transfers dense LiDAR geometric priors to 4D radar features through reconstruction-guided distillation, then refines them with a lightweight diffusion network and an adaptive noise alignment module.
Key results
- First knowledge distillation framework applied to 4D radar super-resolution
- Achieves high-fidelity reconstruction with significantly reduced inference latency
- Outperforms existing methods on reconstruction metrics across VoD and in-house datasets
- Delivers substantial performance gains on downstream autonomous driving tasks
Why it matters
Enables reliable, real-time 4D radar perception for autonomous driving in adverse weather without the computational overhead of traditional diffusion models.
Abstract
4D radar super-resolution, which aims to recon- struct sparse and noisy point clouds into dense and geomet- rically consistent representations, is a foundational problem in autonomous perception. However, existing methods often suffer from high training cost or rely on complex diffusion-based sampling, resulting in high inference latency and poor general- ization, making it difficult to balance accuracy and efficiency. To address these limitations, we propose MSDNet, a multi-stage distillation framework that efficiently transfers dense LiDAR priors to 4D radar features to achieve both high reconstruction quality and computational efficiency. The first stage performs reconstruction-guided feature distillation (RGFD), aligning and densifying the student’s features through feature reconstruction. In the second stage, we propose diffusion-guided feature dis- tillation (DGFD), which treats the stage-one distilled features as a noisy version of the teacher’s representations and refines them via a lightweight diffusion network. Furthermore, we introduce a noise adapter that adaptively aligns the noise level of the feature with a predefined diffusion timestep, enabling a more precise denoising. Extensive experiments on the VoD and in-house datasets demonstrate that MSDNet achieves both high-fidelity reconstruction and low-latency inference in the task of 4D radar point cloud super-resolution, and consistently improves performance on downstream tasks.