Research Analyzer
← Back ICRA 2026

ROOM-3D: Real-Time Unsupervised Online 3D Room Segmentation

Rafael Flor Rodríguez-Rabadán, Carlos Gutiérrez Álvarez, Alexis Bañuls-González, Sergio Lafuente-Arroyo, Saturnino Maldonado-BascÃ3n, Roberto J. LÃ3pez-Sastre

PDF

AI summary

Key figure (auto-extracted from paper)
ROOM-3D enables real-time, unsupervised 3D room segmentation and transition detection during robot navigation using streaming RGB-D data and open-vocabulary semantic reasoning.
Online 3D room segmentation Gaussian SLAM Open-vocabulary semantics Real-time navigation Unsupervised mapping Robot perception

Problem

Existing room segmentation methods rely on offline, complete scene reconstruction, making them unsuitable for real-time robotic navigation and incremental exploration.

Approach

The framework combines Gaussian SLAM with SAM and CLIP to incrementally build an open-vocabulary 3D map, fusing wall occupancy, transition, and context cue maps to detect room boundaries and navigational transitions on-the-fly.

Key results

  • Formalized the novel online 3D room segmentation problem
  • Achieved state-of-the-art online segmentation (m-iF1 = 85.44%) and transition detection (m-iTrF1 = 56.76%)
  • Outperformed offline baselines in recall (89.17%) while maintaining competitive precision
  • Operates in real-time at 1.5 FPS with ablation confirming all three semantic cue maps are essential

Why it matters

Enables service robots, inspection drones, and assistive systems to understand and navigate complex indoor environments in real time without prior maps or offline processing.

Abstract

Room-level understanding is essential for mobile robots operating in indoor environments. Existing room segmentation methods assume an of- fline setting—requiring a complete scene reconstruc- tion before producing the final result—which limits their applicability to real-time robotic navigation. We introduce the novel problem of online 3D room segmentation, where a robot must continuously seg- ment rooms and detect transitions from streaming observations during exploration, and propose ROOM- 3D: a real-time unsupervised framework that com- bines Gaussian-based SLAM with open-vocabulary semantic reasoning to incrementally build a 3D room segmentation without access to future observations or global post-processing. We also introduce instan- taneous evaluation metrics tailored to this online set- ting. Experiments on HM3D-Semantics demonstrate temporally consistent, accurate segmentation under strict online constraints, with state-of-the-art results in the offline evaluation too.

Index terms

Semantic Scene Understanding Object Detection Segmentation and Categorization Mapping

Related papers