SRCF-UAV: Sparse Radar-Camera Fusion for 3D UAV Detection
Yiming Zhao, Zijun Gong, Yang Yang, Ying CUI
AI summary
Problem
Existing 3D UAV detection methods suffer from low accuracy when using single sensors or high computational costs with dense fusion, while available multimodal datasets lack high-precision ground truth and diverse environmental coverage.
Approach
The method initializes object queries using combined 2D image proposals and radar point clouds, then iteratively refines them via a lightweight transformer decoder that sparsely fuses radar and camera features based on spatial distance and velocity differences.
Key results
- Novel sparse radar-camera fusion framework with improved query initialization
- High-precision multimodal dataset with over 20,000 UAV instances across diverse conditions
- 91.65% average precision with 17 ms inference latency
- Superior accuracy and efficiency compared to state-of-the-art fusion detectors
Why it matters
Provides a practical, high-performance solution for low-altitude UAV surveillance while releasing a valuable benchmark dataset to accelerate multimodal detection research.
Abstract
With the rapid development of the low-altitude economy, accurate detection and localization of UAVs have become increasingly important. Conventional radar and visual detection methods have low accuracy, whereas current radar- camera fusion methods are computationally intensive. To over- come these issues, we propose a novel 3D UAV detection ap- proach based on sparse radar-camera fusion, called SRCF-UAV, to achieve high-precision, low-complexity UAV detection in di- verse scenarios. Specifically, we first propose an improved query initialization method that incorporates locations from 2D image proposals and radar point clouds. Then, we propose a query update method that sparsely fuses radar and image queries based on features, velocity, and spatial distance. Furthermore, we develop a radar-camera multimodal data collection platform based on real-time kinematic positioning (RTK) and collect a dataset of centimeter-level precision, comprising over 20,000 UAV instances that cover various scenarios, UAV models, and lighting conditions. Finally, extensive experiments on this dataset demonstrate that the proposed approach can achieve an average precision of up to 91.65% and an inference latency as low as 17 ms, validating its effectiveness and efficiency. The dataset and code will be publicly available to support further research.