← Back ICRA 2026

DroneKey++: A Size Prior-Free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images

Seo-Bin Hwang, Yeong-Jun Cho

PDF

AI summary

Key figure (auto-extracted from paper)

DroneKey++ enables accurate, real-time 3D drone pose estimation without requiring prior size or mesh data, supported by a new large-scale synthetic benchmark.

drone pose estimation prior-free learning synthetic benchmark 3D tracking anti-drone systems keypoint detection

Problem

Existing methods rely on manual physical size or 3D mesh priors, limiting deployment on unseen drones. Current datasets are also small-scale and model-specific, hindering reliable generalization testing.

Approach

The framework jointly detects keypoints, classifies drone types, and estimates 3D pose using a learned decoder that implicitly encodes scale via class embeddings, eliminating external priors and PnP solvers.

Key results

Rotation MAE of 17.34° and translation MAE of 0.135 m
Inference speeds of 414 FPS (GPU) and 19 FPS (CPU)
Introduction of 6DroneSyn: 52K-image benchmark with 7 drone models and 88 backgrounds
Strong generalization across diverse drone types without manual size inputs

Why it matters

Provides a scalable, prior-free solution for real-time anti-drone surveillance and a comprehensive benchmark to advance future research.

Abstract

Accurate 3D pose estimation of drones is essential for security and surveillance systems. However, existing methods often rely on prior drone information such as physical sizes or 3D meshes. At the same time, current datasets are small- scale, limited to single models, and collected under constrained environments, which makes reliable validation of generalization difficult. We present DroneKey++, a prior-free framework that jointly performs keypoint detection, drone classification, and 3D pose estimation. The framework employs a keypoint encoder for simultaneous keypoint detection and classification, and a pose decoder that estimates 3D pose using ray-based geometric reasoning and class embeddings. To address dataset limitations, we construct 6DroneSyn, a large-scale synthetic benchmark with over 50K images covering 7 drone models and 88 outdoor backgrounds, generated using 360-degree panoramic synthesis. Experiments show that DroneKey++ achieves MAE 17.34◦and MedAE 17.1◦for rotation, MAE 0.135 m and MedAE 0.242 m for translation, with inference speeds of 19.25 FPS (CPU) and 414.07 FPS (GPU), demonstrating both strong generalization across drone models and suitability for real-time applications. The dataset is available at [link].

Index terms

Surveillance Robotic Systems Deep Learning for Visual Perception Data Sets for Robotic Vision