Research Analyzer
← Back ICRA 2023

TransVisDrone: Spatio-Temporal Transformer for Vision-Based Drone-To-Drone Detection in Aerial Videos

Tushar Bharat Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah

PDF

Abstract

Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detect- ing drone attacks, or coordinating flight with other drones. However, existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. In this work, we propose a simple yet effective framework, TransVisDrone, that provides an end-to- end solution with higher computational efficiency. We uti- lize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to improve drone detection in challenging scenarios by learning spatio-temporal depen- dencies of drone motion. Our method achieves state-of-the- art performance on three challenging real-world datasets (Average Precision@0.5IOU): NPS 0.95, FLDrones 0.75, and AOT 0.80, and a higher throughput than previous meth- ods. We also demonstrate its deployment capability on edge devices and its usefulness in detecting drone-collision (en- counter). Project: https://tusharsangam.github.io/ TransVisDrone-project-page/

Index terms

Deep Learning for Visual Perception Computer Vision for Automation Aerial Systems: Perception and Autonomy