← Back ICRA 2026

NavDP: Learning Sim-To-Real Navigation Diffusion Policy with Privileged Information Guidance

Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, MENG WEI, Hanqing Wang, Yilun Chen, Tai Wang, Jiangmiao Pang

PDF

AI summary

Key figure (auto-extracted from paper)

NavDP enables zero-shot sim-to-real robot navigation across diverse embodiments by training a diffusion policy entirely in simulation using privileged safety guidance.

Navigation Diffusion Policy Sim-to-Real Transfer End-to-End Navigation Privileged Information Cross-Embodiment Generalization Robot Learning

Problem

Existing learning-based navigation methods struggle with scalability and generalization due to reliance on scarce real-world data and cascaded modular architectures that suffer from compounding errors and latency.

Approach

The authors propose an end-to-end transformer-based diffusion policy trained exclusively in simulation that jointly generates trajectories and predicts safety scores using local RGB-D inputs and privileged simulation metrics like ESDF.

Key results

Achieves zero-shot sim-to-real transfer across four distinct robot platforms without real-world training data
Outperforms prior state-of-the-art methods by 6.3% success rate and 4.0% SPL in simulation, and ~23% in real-world point-goal navigation
Delivers nearly 3x improvement in exploration time and area for no-goal navigation tasks compared to baselines
Introduces a scalable simulation data engine generating over 1 million meters of navigation experience across 3,000 scenes

Why it matters

Enables scalable, cost-effective training of robust navigation policies that generalize across diverse robots and unstructured environments without expensive real-world data collection.

Abstract

Learning to navigate in dynamic and complex open-world environments is a critical yet challenging capa- bility for autonomous robots. Existing approaches often rely on cascaded modular frameworks, which require extensive hyperparameter tuning or learning from limited real-world demonstration data. In this paper, we propose Navigation Diffusion Policy (NavDP), an end-to-end network trained solely in simulation that enables zero-shot sim-to-real transfer across diverse environments and robot embodiments. The core of NavDP is a unified transformer-based architecture that jointly learns trajectory generation and trajectory evaluation, both conditioned solely on local RGB-D observations. By learning to predict critic values for contrastive trajectory samples, our proposed approach effectively leverages supervision from privileged information available in simulation, thereby fostering accurate spatial understanding and enabling the distinction be- tween safe and dangerous behaviors. To support this, we develop an efficient data generation pipeline in simulation and construct a large-scale dataset encompassing over one million meters of navigation experience across 3,000 scenes. Empirical experi- ments in both simulated and real-world environments demon- strate that NavDP significantly outperforms prior state-of-the- art methods. Furthermore, we identify key factors influencing the generalization performance of NavDP. The dataset and code are publicly available at https://wzcai99.github.io/navigation- diffusion-policy.github.io.

Index terms

Vision-Based Navigation Imitation Learning Collision Avoidance