← Back ICRA 2026

GeoGS-SLAM: Online Monocular Reconstruction Using Gaussian Splatting with Geometric Priors

Ruilan Gao, Letian Jin, Yu Zhang

PDF

AI summary

Key figure (auto-extracted from paper)

Synergizing 3D Gaussian Splatting with feed-forward geometric priors enables state-of-the-art, real-time monocular SLAM without depth sensors.

Monocular SLAM 3D Gaussian Splatting Geometric Priors Dense Reconstruction Real-time Mapping Loop Closure

Problem

Existing monocular 3DGS-based SLAM degrades without depth sensors, while geometric prior-based SLAM often discards RGB data during optimization, compromising reconstruction fidelity and visual consistency.

Approach

The system predicts camera and scene priors from uncalibrated RGB using a pre-trained model, directly samples Gaussian primitives from both images and priors, and jointly optimizes poses and the map by minimizing photometric and geometric losses with online loop closure.

Key results

State-of-the-art rendering quality with PSNR gains up to +4.94
Tracking error reduced by up to 64.6% versus prior-based SLAM
Real-time online performance across indoor and outdoor benchmarks
Robust global consistency via integrated loop closure and pose graph optimization

Why it matters

Provides a robust, sensor-light solution for high-fidelity 3D mapping and localization, advancing robotics, autonomous driving, and AR systems that rely solely on monocular cameras.

Abstract

SLAM methods based on 3D Gaussian Splatting (3DGS) have demonstrated impressive tracking and mapping performance, but typically require additional geometric infor- mation from external depth sensors. Meanwhile, recent SLAM systems that leverage geometric priors from pre-trained feed- forward models enable real-time dense reconstruction, yet often discard original RGB information during optimization, thus degrading overall reconstruction quality. We present GeoGS- SLAM, an online monocular dense reconstruction system that combines the 3DGS-based map representation with learned ge- ometric priors. Given uncalibrated RGB input, we first employ a feed-forward visual geometry model to predict camera and scene priors. The Gaussian scene map is then expanded by di- rectly sampling Gaussian primitives from both RGB input and geometric priors. Camera poses and the scene map are jointly optimized through a coarse-to-fine strategy that minimizes both photometric and geometric losses. To ensure global consistency, we further incorporate online loop closure detection and pose graph optimization. Extensive experiments across indoor and outdoor benchmarks demonstrate that GeoGS-SLAM achieves superior rendering quality and tracking accuracy compared to state-of-the-art methods while maintaining online real-time performance. Project page: https://rlgao.github.io/ geogs_slam. This work was supported by the National Natural Science Foundation of China (Grant No. 62576311), in part by NSFC 62088101 Autonomous Intelligent Unmanned Systems, and in part by Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F030001. 1State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China, 310027. 2Key Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of Zhejiang Province, Hangzhou, China, 310027. ̊Corresponding author: Yu Zhang (Email: zhangyu80@zju.edu.cn).

Index terms

SLAM Mapping Localization