← Back ICRA 2026

GeoISF: Instance Semantic Forest Inspired Large-Scale Cross-View Geo-Localization Via Ground LiDAR-to-Satellite Image

Di Hu, Xia Yuan, Chun-xia Zhao

PDF

AI summary

Key figure (auto-extracted from paper)

GeoISF bridges the LiDAR-to-satellite modality gap using an instance semantic forest and semantic distillation, achieving up to 13.22× better retrieval accuracy than existing methods on large-scale datasets.

Large-scale localization LiDAR-to-image Semantic forest Cross-view matching Autonomous navigation Semantic distillation

Problem

Existing cross-view localization methods struggle in large-scale scenarios due to the modality gap between ground LiDAR point clouds and satellite imagery, along with inefficient semantic matching across vast geospatial databases.

Approach

GeoISF builds a hierarchical Instance Semantic Forest from multi-frame LiDAR data using WordNet ontology, then progressively filters and matches satellite image patches through shared textual and structural semantics.

Key results

First LiDAR-to-image pipeline for large-scale cross-view geo-localization
Instance Semantic Forest construction via WordNet ontology for temporal semantic representation
Progressive semantic distillation efficiently prunes candidate satellite patches
Achieves 91.53% r@1% recall on KITTI, outperforming state-of-the-art by 2.97×

Why it matters

Enables reliable, large-scale autonomous navigation and robotics by overcoming critical limitations of current ground-to-aerial localization techniques.

Abstract

The problem of localization on a large-scale satel- lite image given a frame of query ground view point clouds remains challenging. Existing LiDAR-to-image cross-view lo- calization methods struggle in large-scale scenarios due to limited semantic alignment and the modality gap between point clouds and satellite images. This paper introduces the large- scale LiDAR-to-image geo-localization pipeline called GeoISF. GeoISF introduces an instance semantic forest constructed using WordNet, which enhances temporal semantic represen- tation and discriminative power by integrating semantic trees from multiple frames. By leveraging environmental semantic representation as a shared medium, GeoISF effectively bridges the modality gap and improves semantic matching accuracy. Extensive experiments demonstrate the superior performance of GeoISF in large-scale cross-view localization, 13.22 times better than the parallel LiDAR-to-image method in the R@10 metric on the KITTI dataset. The proposed method addresses the existing gap in large-scale LiDAR-to-image cross-view localization, offering a robust solution to the computational and accuracy challenges inherent in such scenarios. We will release the code as an open-source resource available online for the broader research community.

Index terms

Semantic Scene Understanding Localization Range Sensing