ROOM-3D: Real-Time Unsupervised Online 3D Room Segmentation
Rafael Flor RodrÃguez-Rabadán, Carlos Gutiérrez Ãlvarez, Alexis Bañuls-González, Sergio Lafuente-Arroyo, Saturnino Maldonado-BascÃ3n, Roberto J. LÃ3pez-Sastre
AI summary
Problem
Existing room segmentation methods rely on offline, complete scene reconstruction, making them unsuitable for real-time robotic navigation and incremental exploration.
Approach
The framework combines Gaussian SLAM with SAM and CLIP to incrementally build an open-vocabulary 3D map, fusing wall occupancy, transition, and context cue maps to detect room boundaries and navigational transitions on-the-fly.
Key results
- Formalized the novel online 3D room segmentation problem
- Achieved state-of-the-art online segmentation (m-iF1 = 85.44%) and transition detection (m-iTrF1 = 56.76%)
- Outperformed offline baselines in recall (89.17%) while maintaining competitive precision
- Operates in real-time at 1.5 FPS with ablation confirming all three semantic cue maps are essential
Why it matters
Enables service robots, inspection drones, and assistive systems to understand and navigate complex indoor environments in real time without prior maps or offline processing.
Abstract
Room-level understanding is essential for mobile robots operating in indoor environments. Existing room segmentation methods assume an of- fline setting—requiring a complete scene reconstruc- tion before producing the final result—which limits their applicability to real-time robotic navigation. We introduce the novel problem of online 3D room segmentation, where a robot must continuously seg- ment rooms and detect transitions from streaming observations during exploration, and propose ROOM- 3D: a real-time unsupervised framework that com- bines Gaussian-based SLAM with open-vocabulary semantic reasoning to incrementally build a 3D room segmentation without access to future observations or global post-processing. We also introduce instan- taneous evaluation metrics tailored to this online set- ting. Experiments on HM3D-Semantics demonstrate temporally consistent, accurate segmentation under strict online constraints, with state-of-the-art results in the offline evaluation too.