GeoISF: Instance Semantic Forest Inspired Large-Scale Cross-View Geo-Localization Via Ground LiDAR-to-Satellite Image
Di Hu, Xia Yuan, Chun-xia Zhao
AI summary
Problem
Existing cross-view localization methods struggle in large-scale scenarios due to the modality gap between ground LiDAR point clouds and satellite imagery, along with inefficient semantic matching across vast geospatial databases.
Approach
GeoISF builds a hierarchical Instance Semantic Forest from multi-frame LiDAR data using WordNet ontology, then progressively filters and matches satellite image patches through shared textual and structural semantics.
Key results
- First LiDAR-to-image pipeline for large-scale cross-view geo-localization
- Instance Semantic Forest construction via WordNet ontology for temporal semantic representation
- Progressive semantic distillation efficiently prunes candidate satellite patches
- Achieves 91.53% r@1% recall on KITTI, outperforming state-of-the-art by 2.97×
Why it matters
Enables reliable, large-scale autonomous navigation and robotics by overcoming critical limitations of current ground-to-aerial localization techniques.
Abstract
The problem of localization on a large-scale satel- lite image given a frame of query ground view point clouds remains challenging. Existing LiDAR-to-image cross-view lo- calization methods struggle in large-scale scenarios due to limited semantic alignment and the modality gap between point clouds and satellite images. This paper introduces the large- scale LiDAR-to-image geo-localization pipeline called GeoISF. GeoISF introduces an instance semantic forest constructed using WordNet, which enhances temporal semantic represen- tation and discriminative power by integrating semantic trees from multiple frames. By leveraging environmental semantic representation as a shared medium, GeoISF effectively bridges the modality gap and improves semantic matching accuracy. Extensive experiments demonstrate the superior performance of GeoISF in large-scale cross-view localization, 13.22 times better than the parallel LiDAR-to-image method in the R@10 metric on the KITTI dataset. The proposed method addresses the existing gap in large-scale LiDAR-to-image cross-view localization, offering a robust solution to the computational and accuracy challenges inherent in such scenarios. We will release the code as an open-source resource available online for the broader research community.