← Back IROS 2024

Geolocation on Cartographic Maps with Multi-Modal Fusion

Mengjie Zhou, Liu Liu, Yiran Zhong, Andrew Calway

PDF

Abstract

We explore the geolocation problem, aiming to localize ground-view images on cartographic maps, without the need of any GPS priors. This task mimics the human wayfinding ability and offers high scalability and robustness by using the compact and semantic representations of maps. Current methods often rely on 2D maps to encode dense contextual information for ground-to-map matching. In this paper, we lift ground-to-map matching to a 2.5D space, where heights of structures (e.g. buildings) provide richer geometric information to guide the matching process. We propose a new approach to learning representative embeddings from multi-modal data. Specifically, we establish a projection relationship between 2D and 2.5D space. The projection is further used to combine multi- modal features from the 2D and 2.5D maps using an effective pixel-to-point fusion method. By encoding crucial geometric cues, our method learns discriminative location embeddings for matching panoramic images and maps. Additionally, we construct the first large-scale multi-modal geolocation dataset to validate our method and facilitate future research. Both single-image based and route based geolocation experiments are conducted to test our method. Extensive experiments demon- strate that the proposed method achieves significantly higher geolocation accuracy and faster convergence than previous 2D map-based approaches.

Index terms

Localization Semantic Scene Understanding Data Sets for Robotic Vision