Research Analyzer
← Back ICRA 2026

UltraVPR: Unsupervised Lightweight Rotation-Invariant Aerial Visual Place Recognition

Chao Chen, Chunyu Li, Mengfan He, Jun Wang, Fei Xing, Ziyang Meng

PDF

AI summary

Key figure (auto-extracted from paper)
UltraVPR delivers state-of-the-art aerial place recognition accuracy while remaining lightweight and robust to in-plane rotations, enabling reliable UAV navigation without GPS.
Visual Place Recognition UAV Navigation Rotation Invariance Lightweight Deep Learning Aerial Imagery Unsupervised Training

Problem

CNN-based visual place recognition models fail under the in-plane rotations typical of aerial imagery, while Transformer-based alternatives are too computationally heavy for resource-constrained UAVs.

Approach

The model uses a rotation-equivariant E2ResNet backbone paired with a rotation-invariant aggregation layer to maintain descriptor consistency across orientations, optimized via an unsupervised clustering strategy that boosts representation without increasing deployment dimensionality.

Key results

  • Proposes a lightweight rotation-invariant VPR architecture combining E2ResNet and rotation-invariant aggregation
  • Introduces an unsupervised training method using high-dimensional VLAD descriptors for optimization while maintaining low-dimensional deployment descriptors
  • Outperforms state-of-the-art methods on VP-Air, UAV-VisLoc, and AerialVL datasets
  • Achieves high Recall@1 performance with reduced memory and computational overhead for UAV deployment

Why it matters

Provides a computationally efficient and rotation-robust localization solution critical for GPS-denied UAV navigation in real-world aerial missions.

Abstract

Aerial Visual Place Recognition (VPR) is critical for Unmanned Aerial Vehicles (UAVs) localization, especially in envi- ronments with unstable or unavailable GPS signals. While neural network-based VPR methods have become mainstream, they face significant challenges on UAV platforms. Traditional CNN-based VPR models are highly sensitive to image rotation, degrading their performance in aerial-domain environments. Meanwhile, Transformer-based models have high computational complexity, making them less suitable for resource-constrained UAVs. In this letter, we propose a lightweight, rotation-invariant aerial VPR method. Our approach combines a rotation-equivariant backbone network with a rotation-invariant aggregation layer to ensure descriptor consistency across different orientations. Additionally, we propose an unsupervised training strategy that constructs higher-dimensional descriptors to optimize the model, while maintaining the lower descriptor dimensionality during appli- cation. Experimental results show that our method outperforms state-of-the-art methods across multiple aerial VPR datasets. The code will be released at https://github.com/cbbhuxx/UltraVPR.

Index terms

Localization Recognition Aerial Systems: Perception and Autonomy

Related papers