SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding
Sheng Ye, Zhen-Hui Dong, Ruoyu Fan, Tian Lv, Yong-Jin Liu
AI summary
Problem
Existing semantic scene reconstruction methods rely on dense multi-view inputs and require slow, scene-specific optimization, limiting their scalability and real-world applicability.
Approach
SemGS uses a feed-forward dual-branch network with camera-aware attention to extract color and semantic features from sparse views, decodes them into shared-geometry dual Gaussians, and rasterizes them to render semantic maps in a single pass.
Key results
- State-of-the-art mIoU and accuracy on ScanNet and ScanNet++ benchmarks
- Rapid inference speeds exceeding 6 FPS without per-scene optimization
- Strong cross-domain generalization to synthetic and real-world scenes
- Regional smoothness loss improves semantic coherence and boundary sharpness
Why it matters
Provides robots and vision systems with a scalable, real-time tool for high-level semantic understanding in unknown environments.
Abstract
Semantic understanding of 3D scenes is essential for robots to operate effectively and safely in complex environ- ments. Existing methods for semantic scene reconstruction and semantic-aware novel view synthesis often rely on dense multi- view inputs and require scene-specific optimization, limiting their practicality and scalability in real-world applications. To address these challenges, we propose SemGS, a feed-forward framework for reconstructing generalizable semantic fields from sparse image inputs. SemGS uses a dual-branch archi- tecture to extract color and semantic features, where the two branches share shallow CNN layers, allowing semantic reason- ing to leverage textural and structural cues in color appearance. We also incorporate a camera-aware attention mechanism into the feature extractor to explicitly model geometric relationships between camera viewpoints. The extracted features are decoded into dual-Gaussians that share geometric consistency while preserving branch-specific attributes, and further rasterized to synthesize semantic maps under novel viewpoints. Additionally, we introduce a regional smoothness loss to enhance semantic coherence. Experiments show that SemGS achieves state-of-the- art performance on benchmark datasets, while providing rapid inference and strong generalization capabilities across diverse synthetic and real-world scenarios.