AWENet: A Self-Supervised Network for Efficient Interest Point Detection and Description
Pengwei Jia, Kang Li, Siren Batu
AI summary
Problem
Existing end-to-end interest point detection and description networks struggle with high computational overhead, limited descriptor discriminability in self-supervised settings, and suboptimal localization accuracy.
Approach
The proposed network combines accelerated convolutional expansion, wavelet-based downsampling to preserve multi-frequency details, and multi-scale attention, all guided by multi-objective knowledge distillation from a teacher model.
Key results
- Lowest localization error on HPatches
- Top homography estimation accuracy at 3 and 5 pixel thresholds
- Highest mean matching accuracy under illumination changes
- Significantly improved processing speed with competitive matching scores
Why it matters
Enables efficient, high-precision local feature extraction for real-time computer vision tasks like SLAM and visual localization on resource-constrained hardware.
Abstract
We introduce AWENet (Attention-guided Wavelet Enhancement Network), an efficient self-supervised network for joint interest point detection and description that balances com- putational speed with feature accuracy. The network preserves fine structural details while employing multi-scale attention to enhance the discriminability of descriptors, leading to more precise and reliable interest point correspondences. Evaluations on the HPatches dataset demonstrate that AWENet achieves competitive performance in repeatability, localization accuracy, and matching robustness. Its lightweight design ensures fast processing and low computational cost, making it well-suited for applications where efficiency is critical. Qualitative results show that the network generates dense and accurate correspondences under diverse transformations, including changes in viewpoint and illumination. Overall, AWENet provides a practical and effective solution for learning local features, achieving strong matching performance without relying on heavy computation.