Research Analyzer
← Back ICRA 2026

Dual Prompt-Driven Feature Encoding for Nighttime UAV Tracking

Yiheng Wang, Changhong Fu, Liangliang Yao, Haobo Zuo, Zijie Zhang

PDF

AI summary

Key figure (auto-extracted from paper)
DPTracker outperforms state-of-the-art trackers on nighttime UAV benchmarks by adaptively encoding illumination and viewpoint cues into Vision Transformer features.
Nighttime UAV tracking Prompt tuning Feature encoding Illumination adaptation Viewpoint invariance Vision Transformer

Problem

Existing UAV trackers struggle in nighttime conditions because their feature encoding ignores critical illumination degradation and dynamic aerial viewpoint variations, leading to poor tracking robustness.

Approach

The authors introduce DPTracker, which injects learned illumination and viewpoint prompts into a Vision Transformer backbone to dynamically adapt feature representations for low-light, aerial tracking scenarios.

Key results

  • Introduces DPBlock with bidirectional prompt-feature interaction for adaptive representation learning
  • Designs a pyramid illumination prompter to extract multi-scale, frequency-aware lighting cues
  • Develops a dynamic viewpoint prompter using deformable convolutions to capture aerial geometric variations
  • Achieves state-of-the-art precision and success rates on UAVDark135 and other nighttime tracking benchmarks

Why it matters

Enables reliable autonomous UAV operations in low-light environments, critical for real-world applications like surveillance, navigation, and disaster response.

Abstract

Robust feature encoding constitutes the founda- tion of UAV tracking by enabling the nuanced perception of target appearance and motion, thereby playing a pivotal role in ensuring reliable tracking. However, existing feature encoding methods often overlook critical illumination and viewpoint cues, which are essential for robust perception under challenging nighttime conditions, leading to degraded tracking performance. To overcome the above limitation, this work proposes a dual prompt-driven feature encoding method that integrates prompt-conditioned feature adaptation and context- aware prompt evolution to promote domain-invariant feature encoding. Specifically, the pyramid illumination prompter is proposed to extract multi-scale frequency-aware illumination prompts. The dynamic viewpoint prompter modulates de- formable convolution offsets to accommodate viewpoint vari- ations, enabling the tracker to learn view-invariant features. Extensive experiments validate the effectiveness of the proposed dual prompt-driven tracker (DPTracker) in tackling nighttime UAV tracking. Ablation studies highlight the contribution of each component in DPTracker. Real-world tests under diverse nighttime UAV tracking scenarios further demonstrate the robustness and practical utility. The code and demo videos are available at https://github.com/yiheng-wang-duke/ DPTracker.

Index terms

Aerial Systems: Applications Computer Vision for Automation Deep Learning for Visual Perception

Related papers