UltraVPR: Unsupervised Lightweight Rotation-Invariant Aerial Visual Place Recognition
Chao Chen, Chunyu Li, Mengfan He, Jun Wang, Fei Xing, Ziyang Meng
AI summary
Problem
CNN-based visual place recognition models fail under the in-plane rotations typical of aerial imagery, while Transformer-based alternatives are too computationally heavy for resource-constrained UAVs.
Approach
The model uses a rotation-equivariant E2ResNet backbone paired with a rotation-invariant aggregation layer to maintain descriptor consistency across orientations, optimized via an unsupervised clustering strategy that boosts representation without increasing deployment dimensionality.
Key results
- Proposes a lightweight rotation-invariant VPR architecture combining E2ResNet and rotation-invariant aggregation
- Introduces an unsupervised training method using high-dimensional VLAD descriptors for optimization while maintaining low-dimensional deployment descriptors
- Outperforms state-of-the-art methods on VP-Air, UAV-VisLoc, and AerialVL datasets
- Achieves high Recall@1 performance with reduced memory and computational overhead for UAV deployment
Why it matters
Provides a computationally efficient and rotation-robust localization solution critical for GPS-denied UAV navigation in real-world aerial missions.
Abstract
Aerial Visual Place Recognition (VPR) is critical for Unmanned Aerial Vehicles (UAVs) localization, especially in envi- ronments with unstable or unavailable GPS signals. While neural network-based VPR methods have become mainstream, they face significant challenges on UAV platforms. Traditional CNN-based VPR models are highly sensitive to image rotation, degrading their performance in aerial-domain environments. Meanwhile, Transformer-based models have high computational complexity, making them less suitable for resource-constrained UAVs. In this letter, we propose a lightweight, rotation-invariant aerial VPR method. Our approach combines a rotation-equivariant backbone network with a rotation-invariant aggregation layer to ensure descriptor consistency across different orientations. Additionally, we propose an unsupervised training strategy that constructs higher-dimensional descriptors to optimize the model, while maintaining the lower descriptor dimensionality during appli- cation. Experimental results show that our method outperforms state-of-the-art methods across multiple aerial VPR datasets. The code will be released at https://github.com/cbbhuxx/UltraVPR.