Research Analyzer
← Back ICRA 2026

LH-DETR: A Lightweight Hybrid Architecture for End-To-End Object Detection in UAV Images

Feifei Xu, Lupeng Sun, Dongyang Li, Guoxiang Wu, Chenchuan Lv

PDF

AI summary

LH-DETR achieves a superior accuracy-efficiency trade-off for UAV object detection by efficiently modeling multi-scale features and enhancing small object perception with minimal computational overhead.
UAV object detection lightweight detector Mamba state-space frequency-aware FFN end-to-end detection real-time inference

Problem

UAV object detection struggles with small, densely distributed objects and complex backgrounds, while limited onboard computing resources make traditional or heavy end-to-end detectors too slow and computationally expensive for real-time deployment.

Approach

The authors introduce LH-DETR, a lightweight end-to-end detector that combines a Wavelet-Mamba Hybrid Block for efficient multi-scale feature extraction, a Frequency-Aware Dynamic FFN to amplify critical high-frequency details, and an adaptive loss function for stable training.

Key results

  • Introduces Wavelet-Mamba Hybrid Block (WMHB) for linear-complexity long-range dependency modeling and multi-scale feature extraction
  • Designs Frequency-Aware Dynamic FFN (FAD-FFN) that reduces parameters by 25% while selectively amplifying high-frequency components for small object detection
  • Proposes AutoSliding Varifocal Loss (ASVLoss) that dynamically shifts optimization focus from medium to high-quality predictions during training
  • Demonstrates outstanding accuracy-efficiency trade-offs on VisDrone, UAVVaste, and UAV-PDD datasets, significantly outperforming existing real-time detectors

Why it matters

Enables real-time, high-accuracy object detection on resource-constrained UAV platforms, advancing autonomous perception for applications like agriculture, security, and disaster relief.

Abstract

Object detection in unmanned aerial vehicles (UAVs) has become a research highlight at the intersection of computer vision and robotics technology, and its applications in security inspection, agricultural monitoring, disaster relief and others are becoming increasingly widespread. The key to achieving autonomous perception and decision-making of UAV lies in precise and real-time object detection. However, objects from the perspective of UAV often have characteristics such as small scale and dense distribution, coupled with limited onboard computing resources, which poses significant challenges to traditional detection algorithms. To address the trade-offs, this paper proposes LH-DETR, a lightweight hybrid architecture for end-to-end object detection, referring to three specialized innovations. We propose a Wavelet-Mamba Hybrid Block (WMHB), a novel backbone component that synergistically combines the linear-complexity of Mamba state-space model for capturing long-range dependencies with the multi-scale feature extraction capabilities of wavelet transforms. To better identify small objects, a Frequency-Aware Dynamic FFN (FAD- FFN) is designed to selectively amplify critical high-frequency components—like edges and textures—by analyzing features in the frequency domain. Additionally, AutoSliding Varifocal Loss (ASVLoss) is defined to stabilize the model’s optimization, which is an adaptive loss function that dynamically shifts its focus from medium-quality to high-quality predictions as training progresses. Experiments on public aerial datasets demonstrate that LH-DETR achieves an outstanding balance between accuracy and speed, significantly improving detection performance for small objects while greatly reducing the computational complexity.

Index terms

Object Detection Segmentation and Categorization Computer Vision for Manufacturing Deep Learning for Visual Perception

Related papers