← Back ICRA 2026

SSMG-Nav: Enhancing Lifelong Object Navigation with Semantic Skeleton Memory Graph

Haochen Niu, Lantao Zhang, Xingwu Ji, RENDONG YING, Peilin Liu, Fei Wen

PDF

AI summary

Key figure (auto-extracted from paper)

SSMG-Nav significantly boosts success rates and path efficiency in lifelong object navigation by unifying persistent spatial memory with vision-language reasoning and long-horizon planning.

Lifelong navigation Semantic memory graph Vision-language models Long-horizon planning Object navigation Multimodal reasoning

Problem

Existing object navigation methods lack persistent, reusable memory, rely on single-modality inputs, and employ myopic greedy policies that cause inefficient back-and-forth maneuvers, limiting their effectiveness in lifelong settings.

Approach

The framework constructs a Semantic Skeleton Memory Graph anchored by topological keypoints to consolidate past observations, then uses a vision-language model to estimate target beliefs and a long-horizon planner to optimize visitation sequences and minimize backtracking.

Key results

Novel Semantic Skeleton Memory Graph unifying entity and spatial semantics
Long-horizon planner balancing VLM belief and traversal costs
State-of-the-art performance on GOAT-Bench lifelong navigation benchmarks
Substantial gains in success rates and path efficiency over zero-shot baselines

Why it matters

Enables service robots to navigate more reliably and efficiently across diverse, unseen environments by leveraging reusable spatial memory and multimodal reasoning.

Abstract

Navigating to out-of-sight targets from human instructions in unfamiliar environments is a core capability for service robots. Despite substantial progress, most approaches underutilize reusable, persistent memory, constraining per- formance in lifelong settings. Many are additionally limited to single-modality inputs and employ myopic greedy poli- cies, which often induce inefficient back-and-forth maneuvers (BFMs). To address such limitations, we introduce SSMG-Nav, a framework for object navigation built on a Semantic Skeleton Memory Graph (SSMG) that consolidates past observations into a spatially aligned, persistent memory anchored by topolog- ical keypoints (e.g., junctions, room centers). SSMG clusters nearby entities into subgraphs, unifying entity- and space-level semantics to yield a compact set of candidate destinations. To support multimodal targets (images, objects, and text), we integrate a vision-language model (VLM). For each subgraph, a multimodal prompt synthesized from memory guides the VLM to infer a target belief over destinations. A long-horizon planner then trades off this belief against traversability costs to produce a visit sequence that minimizes expected path length, thereby reducing backtracking. Extensive experiments on challenging lifelong benchmarks and standard ObjectNav benchmarks demonstrate that, compared to strong baselines, our method achieves higher success rates and greater path efficiency, validating the effectiveness of SSMG-Nav.

Index terms

Vision-Based Navigation Task Planning