Unified Neural Gaussian SLAM with Feature Splatting
Xuyang Tang, Henry Chu, Yuxiang Sun
AI summary
Problem
Current 3D Gaussian Splatting SLAM methods suffer from scene inconsistency, visual artifacts, and high memory costs due to maintaining millions of independent Gaussians, while failing to leverage correlations between different image modalities.
Approach
The method represents scenes as a continuous multi-scale feature plane that encodes spatial, directional, and distance information to generate Gaussian parameters and high-dimensional feature embeddings, which are rasterized and decoded for multiple outputs.
Key results
- Unified latent feature space enabling multi-modal decoding
- State-of-the-art rendering quality with +0.89 dB PSNR gain on Replica
- Improved mesh reconstruction with sharper edges and lower depth L1 error
- Competitive tracking accuracy with reduced trajectory error on standard benchmarks
Why it matters
Provides a memory-efficient, multi-modal 3D reconstruction framework critical for AR/VR, robotics, and autonomous navigation applications.
Abstract
Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive progress in high-fidelity scene re- construction within visual SLAM. However, existing approaches often suffer from scene inconsistency, leading to visual artifacts, and the explicit maintenance of millions of Gaussians imposes significant storage overhead. To address these limitations, we present a unified Neural Gaussian SLAM with feature splatting, which represents the spatial scene as a coherent feature space while encoding view direction, distance, and position into neural Gaussians. Arbitrary image modalities—including color, depth, normals, semantics, and even language—can be decoded from this feature space. Extensive evaluations on several challenging datasets show that our method achieves state-of-the-art perfor- mance in rendering quality, reconstruction accuracy, and pose estimation.