Research Analyzer
← Back ICRA 2026

DIPP: A Diffusion-Based Potential Planner for Synergistic Navigation and Mapping

Yiqing Zhang, Tao Wang, Miaoxin Pan, Yi Yang, Mengyin Fu

PDF

AI summary

Key figure (auto-extracted from paper)
DIPP unifies navigation and mapping via a dual-channel diffusion model, enabling agents to build persistent spatial knowledge while efficiently finding object goals.
Object-Goal Navigation Diffusion Models Topological Mapping Embodied AI Potential Fields Long-Horizon Planning

Problem

Current Object-Goal Navigation methods discard environmental knowledge after each episode, limiting long-term autonomy and forcing agents to re-learn layouts from scratch.

Approach

DIPP employs a dual-channel conditional diffusion model to simultaneously generate a goal-directed navigation potential and a structural topological potential, enabling incremental construction of a persistent topological graph for strategic waypoint selection.

Key results

  • Strong performance on standard ObjectNav metrics (SR, SPL) in Habitat/Gibson
  • High Node Recall for structurally accurate persistent topological maps
  • Hierarchical graph-based planning significantly boosts long-horizon navigation
  • Novel two-stage curriculum enables synergistic potential field generation

Why it matters

Enables long-term autonomous operation for embodied agents by providing a reusable, abstract understanding of unseen environments, bridging the gap between reactive navigation and strategic spatial reasoning.

Abstract

Object-Goal Navigation (ObjectNav) requires an embodied agent to search for and reach a target object category in previously unseen environments using only onboard egocen- tric observations, which is a fundamental capability for long- horizon autonomous robots. Current Object-Goal Navigation methods typically discard environmental knowledge after each episode, limiting their ability to operate autonomously over long horizons. To overcome this limitation, we introduce DIPP, a diffusion-based potential planner that unifies navigation and mapping. DIPP generates two complementary potential fields: a navigation potential that directs the agent toward the target and a topological potential that captures the environment’s structural skeleton. The topological potential serves a dual purpose: it acts as an implicit structural prior for waypoint selection when fused directly with the navigation potential and, more importantly, enables the incremental construction of a persistent, explicit topological graph. This graph enables a hierarchical policy to select strategic, long-horizon waypoints, elevating planning from a tactical search to a strategic decision. We evaluate DIPP in the Habitat simulator on the Gibson dataset. Results show that DIPP achieves strong performance on standard ObjectNav metrics (SR, SPL) while constructing structurally accurate maps, evidenced by a high Node Recall score. Furthermore, leveraging the explicit persistent graph for hierarchical planning significantly boosts navigation perfor- mance. These findings demonstrate the effectiveness of DIPP in enabling embodied agents to build and exploit persistent spatial knowledge for long-term operation in unseen environments.

Index terms

AI-Based Methods Semantic Scene Understanding Autonomous Agents

Related papers