← Back ICRA 2026

DSSM-SG: Dynamic 3D Scene Graphs with Spatio-Semantic Memory for Long-Term Indoor Navigation Tasks

Yi Ruan, Yaowen Zhang, Miaoxin Pan, Yi Yang, Mengyin Fu

PDF

AI summary

Key figure (auto-extracted from paper)

DSSM-SG enables robust long-term indoor navigation by dynamically updating spatio-semantic 3D scene graphs and grounding ambiguous language instructions via LLM-driven subgraph matching.

Dynamic scene graphs Indoor navigation Spatio-semantic memory Language grounding LLM Open-vocabulary perception

Problem

Most existing scene graph and navigation methods assume static environments, failing to adapt to moving objects and changing layouts, which hinders reliable long-term, language-guided robot navigation.

Approach

The framework constructs a dynamic open-vocabulary 3D scene graph using viewpoint-based dynamic detection and spatio-semantic memory, updates it incrementally during re-navigation, and matches LLM-generated target subgraphs to local scene observations for precise goal grounding.

Key results

Surpasses baselines in static and dynamic object construction accuracy (up to 91.5% F1)
Achieves 100% precision in tracking moving objects across dynamic states
Enables robust incremental graph updates that recover from perception errors and adapt to layout changes
Improves language-guided navigation success through LLM-driven subgraph matching and topological alignment

Why it matters

Provides autonomous robots with a scalable, adaptive mapping and navigation solution for real-world indoor environments where objects and layouts frequently change.

Abstract

Dynamic indoor environments pose significant challenges for autonomous robots, as objects frequently move and scenes continuously change, requiring robust scene rep- resentation and adaptive navigation strategies. In this work, we introduce DSSM-SG, a dynamic open-vocabulary 3D scene graph framework enhanced with spatial-semantic memory, to support complex language instruction parsing and goal navigation in dynamic environments. First, we construct a multi-layered scene graph by combining waypoint topology with semantic object information, and propose a viewpoint- based mechanism to model object dynamics and detect scene changes, enabling more precise semantic-geometric representa- tion. Second, we design an efficient incremental graph update strategy that adapts to object-level dynamics and navigation- observed obstacles, thereby maintaining graph consistency and alleviating mismatch during re-navigation. Finally, we introduce a subgraph generation and matching approach driven by large language models, significantly improving the system’s ability to interpret and ground ambiguous goal descriptions. Exper- imental results demonstrate that DSSM-SG achieves superior performance in scene graph accuracy, update efficiency, and language goal navigation success compared to existing baselines in dynamic indoor environments.

Index terms

Semantic Scene Understanding Embodied Cognitive Science Motion and Path Planning