Dense Monocular SLAM in Real-Time with Structured Gaussian Representation
Shaofan Liu, Xing Wei, Chong Zhao, Aoxiang Tian, Bin Du
AI summary
Problem
Monocular dense SLAM struggles with tracking failures in low-texture environments and under rapid camera motion, while existing 3D Gaussian-based systems rely on end-to-end optimization that cannot efficiently balance camera tracking with high-quality scene reconstruction.
Approach
The method decouples tracking and mapping by feeding differentiable pseudo-depth maps from a structured 3D Gaussian representation into a traditional direct visual odometry pipeline, while using DUSt3R to guide Gaussian optimization and an octree structure for fast mesh extraction.
Key results
- Lowest mean trajectory RMSE on Replica and TUM RGB-D datasets compared to existing methods
- Real-time dense 3D reconstruction with direct triangular mesh extraction from Gaussians
- Robust camera tracking in low-texture and rapid-motion scenarios via depth-guided visual odometry
- Seamless integration of traditional VO with 3DGS outperforming monocular Gaussian-based SLAM baselines
Why it matters
Provides a reliable, real-time dense mapping solution for autonomous systems and robotics operating with single-camera setups in challenging environments.
Abstract
Monocular dense SLAM faces significant chal- lenges in low-texture environments and under rapid camera motions. The recent development of 3D Gaussian Splatting (3DGS) offers a promising approach for real-time dense 3D reconstruction. However, existing 3DGS-based SLAM systems employ end-to-end optimization frameworks, which often strug- gle to achieve both efficient camera tracking and high-quality scene reconstruction simultaneously. To address this challenge, we propose a dense decoupled SLAM system that seamlessly integrates traditional visual odometry with 3DGS within a unified framework. Our system utilizes dense direct image alignment using pseudo-depth maps rendered from a global model, which is represented by an octree-managed structured Gaussian representation. This structured Gaussian supports fast rendering and efficient mesh extraction. Furthermore, we adopt a stereo 3D reconstruction model to generate dense depth maps from visual odometry for optimizing the 3D Gaussians. Experimental results demonstrate that our framework achieves state-of-the-art performance in both tracking robustness and reconstruction outperforming to existing monocular Gaussian- based SLAM systems, while maintaining real-time efficiency.