← Back ICRA 2026

SRCF-UAV: Sparse Radar-Camera Fusion for 3D UAV Detection

Yiming Zhao, Zijun Gong, Yang Yang, Ying CUI

PDF

AI summary

Key figure (auto-extracted from paper)

SRCF-UAV achieves 91.65% average precision and 17 ms latency for 3D UAV detection by sparsely fusing radar and camera features through distance- and velocity-aware query updates.

radar-camera fusion 3D UAV detection sparse queries multimodal dataset real-time detection low-altitude economy

Problem

Existing 3D UAV detection methods suffer from low accuracy when using single sensors or high computational costs with dense fusion, while available multimodal datasets lack high-precision ground truth and diverse environmental coverage.

Approach

The method initializes object queries using combined 2D image proposals and radar point clouds, then iteratively refines them via a lightweight transformer decoder that sparsely fuses radar and camera features based on spatial distance and velocity differences.

Key results

Novel sparse radar-camera fusion framework with improved query initialization
High-precision multimodal dataset with over 20,000 UAV instances across diverse conditions
91.65% average precision with 17 ms inference latency
Superior accuracy and efficiency compared to state-of-the-art fusion detectors

Why it matters

Provides a practical, high-performance solution for low-altitude UAV surveillance while releasing a valuable benchmark dataset to accelerate multimodal detection research.

Abstract

With the rapid development of the low-altitude economy, accurate detection and localization of UAVs have become increasingly important. Conventional radar and visual detection methods have low accuracy, whereas current radar- camera fusion methods are computationally intensive. To over- come these issues, we propose a novel 3D UAV detection ap- proach based on sparse radar-camera fusion, called SRCF-UAV, to achieve high-precision, low-complexity UAV detection in di- verse scenarios. Specifically, we first propose an improved query initialization method that incorporates locations from 2D image proposals and radar point clouds. Then, we propose a query update method that sparsely fuses radar and image queries based on features, velocity, and spatial distance. Furthermore, we develop a radar-camera multimodal data collection platform based on real-time kinematic positioning (RTK) and collect a dataset of centimeter-level precision, comprising over 20,000 UAV instances that cover various scenarios, UAV models, and lighting conditions. Finally, extensive experiments on this dataset demonstrate that the proposed approach can achieve an average precision of up to 91.65% and an inference latency as low as 17 ms, validating its effectiveness and efficiency. The dataset and code will be publicly available to support further research.

Index terms

Object Detection Segmentation and Categorization Aerial Systems: Perception and Autonomy Deep Learning for Visual Perception