← Back ICRA 2026

Semantic-Augmented 3D Gaussian Splatting for Visual Localization in Complex Indoor Environments

Ba Tuan Hoang Chu, Gon-Woo Kim

PDF

AI summary

Key figure (auto-extracted from paper)

Fusing semantic object filtering with 3D Gaussian Splatting and a coarse-to-fine matching strategy dramatically improves visual localization accuracy and robustness in complex, dynamic indoor environments.

Visual localization 3D Gaussian Splatting Semantic mapping Place recognition Coarse-to-fine matching Indoor navigation

Problem

Conventional visual localization methods lose accuracy and robustness in complex indoor settings due to dynamic scene changes, occlusions, and repetitive structures that break feature matching. Existing training-dependent or NeRF-based approaches also suffer from poor generalization, high computational costs, and lighting sensitivity.

Approach

The framework integrates semantic information into a 3D Gaussian map to isolate stable, distinctive objects, then uses a large language model to guide robust map construction. A novel coarse-to-fine matching strategy optimizes reference viewpoints for rendering, followed by iterative PnP refinement to precisely estimate the camera pose.

Key results

Semantic-augmented 3D Gaussian map filters dynamic objects to enhance environmental awareness
Coarse-to-fine matching strategy with overview-shot generation and viewpoint filtering optimizes reference image rendering
Superior pose estimation accuracy and robustness across multiple complex indoor datasets compared to state-of-the-art methods
Training-free design enables strong generalization to diverse and dynamically changing indoor environments

Why it matters

Provides a reliable, generalizable localization solution for autonomous robots navigating real-world indoor spaces where traditional vision-based methods fail.

Abstract

This letter presents a new visual localization frame- 4 work for complex indoor environments under dynamic scene 5 change conditions. Conventional visual localization methods often 6 struggle to maintain accuracy and robustness in such environ- 7 ments, where frequent scene changes, occlusions, diverse object 8 categories, and intricate scene structures significantly affect feature 9 consistency and matching reliability. These challenges highlight 10 the need for a more adaptive and semantically aware localization 11 approach. By proposing an algorithm that integrates semantic 12 informationwithaGaussianmapasinput,themethodenhances the 13 algorithm’s environmental awareness. This allows robust objects to 14 be identified and extracted, thereby improving feature extraction 15 performance and consequently enhancing pose estimation preci- 16 sion. Furthermore, a new coarse-to-fine matching strategy has been 17 developed that takes an overview of the Gaussian map, from which 18 suitable viewpoints are generated to produce the best matching 19 images. Rendered images produced from the Gaussian map are 20 employed in subsequent stages to improve comparison effective- 21 ness, thereby enabling the determination of the most accurate 22 camera pose. Finally, the capability of the proposed methodology 23 is confirmed through experiments on different types of datasets. 24

Index terms

Semantic Scene Understanding Localization Recognition