HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction
Wei Zhang, Qing Cheng, David Skuddis, Niclas Zeller, Daniel Cremers, Norbert Haala
AI summary
Problem
Monocular 3D reconstruction suffers from scale ambiguity and noisy depth estimates, while existing SLAM methods typically force a tradeoff between rendering quality and geometric accuracy or require expensive depth sensors.
Approach
The system uses a hybrid architecture that corrects monocular depth scale distortions via a grid-based alignment strategy and leverages 3D Gaussian Splatting as an explicit, incrementally growing map for fast online tracking and joint pose-geometry optimization.
Key results
- Grid-based scale alignment corrects monocular depth distortions
- 3D Gaussian Splatting enables efficient online mapping and high-quality rendering
- Surpasses RGB-D methods in both geometry accuracy and visual fidelity
- Hierarchical optimization reduces trajectory error by 29.3%
Why it matters
It enables real-time, high-fidelity 3D scene reconstruction and navigation using only lightweight, low-cost RGB cameras, eliminating the need for expensive depth sensors or LiDAR.
Abstract
We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing neural SLAM or 3DGS-based SLAM methods often tradeoff between rendering quality and geometry accuracy, our research demonstrates that bothcanbeachievedsimultaneouslywithRGBinputalone.Thekey idea of our approach is to enhance the ability for geometry estima- tion by combining easy-to-obtain monocular priors with learning- based dense SLAM, and then using 3-D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3-D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, Waymo Open, ETH3D SLAM and ScanNet++ datasets, we demonstrate significant improvements over existing neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.