Research Analyzer
← Back ICRA 2026

Unified Neural Gaussian SLAM with Feature Splatting

Xuyang Tang, Henry Chu, Yuxiang Sun

PDF

AI summary

Key figure (auto-extracted from paper)
A unified SLAM system that encodes diverse scene modalities into a single coherent feature space, achieving state-of-the-art rendering, reconstruction, and tracking while reducing memory overhead.
3D Gaussian Splatting Visual SLAM Feature Splatting Multi-modal Reconstruction Scene Representation

Problem

Current 3D Gaussian Splatting SLAM methods suffer from scene inconsistency, visual artifacts, and high memory costs due to maintaining millions of independent Gaussians, while failing to leverage correlations between different image modalities.

Approach

The method represents scenes as a continuous multi-scale feature plane that encodes spatial, directional, and distance information to generate Gaussian parameters and high-dimensional feature embeddings, which are rasterized and decoded for multiple outputs.

Key results

  • Unified latent feature space enabling multi-modal decoding
  • State-of-the-art rendering quality with +0.89 dB PSNR gain on Replica
  • Improved mesh reconstruction with sharper edges and lower depth L1 error
  • Competitive tracking accuracy with reduced trajectory error on standard benchmarks

Why it matters

Provides a memory-efficient, multi-modal 3D reconstruction framework critical for AR/VR, robotics, and autonomous navigation applications.

Abstract

Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive progress in high-fidelity scene re- construction within visual SLAM. However, existing approaches often suffer from scene inconsistency, leading to visual artifacts, and the explicit maintenance of millions of Gaussians imposes significant storage overhead. To address these limitations, we present a unified Neural Gaussian SLAM with feature splatting, which represents the spatial scene as a coherent feature space while encoding view direction, distance, and position into neural Gaussians. Arbitrary image modalities—including color, depth, normals, semantics, and even language—can be decoded from this feature space. Extensive evaluations on several challenging datasets show that our method achieves state-of-the-art perfor- mance in rendering quality, reconstruction accuracy, and pose estimation.

Index terms

SLAM Localization Mapping

Related papers