← Back ICRA 2024

Globalizing Local Features: Image Retrieval Using Shared Local Features with Pose Estimation for Faster Visual Localization

Song Wenzheng, Ran Yan, Boshu Lei, Takayuki Okatani

PDF

Abstract

Visual localization is an important sub-task in SfM and visual SLAM that involves estimating a 6-DoF camera pose for an input query image relative to a given 3D model of the environment. The most accurate approach is a hierarchical one that splits the task into two stages: image retrieval and camera pose estimation. Each stage requires different image features, with global features compactly encoding holistic image information for the first stage and local features encoding the appearance around salient image points for the second stage. While existing methods use independent networks to extract these features, one for global and one for local, this strategy is suboptimal in terms of computational efficiency. In this paper, we propose a novel approach that achieves state-of-the- art inference accuracy with significantly improved efficiency. Our approach’s core component is SuperGF, a network that aggregates local features optimized for camera pose estimation to create a global feature that enables precise image retrieval. Through extensive experiments on the standard benchmark tests, we demonstrate that the method offers a better trade- off between accuracy and computational cost.

Index terms

Localization Deep Learning for Visual Perception Computer Vision for Automation