← Back ICRA 2026

StepNav: Structured Trajectory Priors for Efficient and Multimodal Visual Navigation

Xubo Luo, Aodi Wu, Haodong Han, Xue Wan∗, Wei Zhang, Leizheng Shu and Ruisuo Wang

PDF

AI summary

Key figure (auto-extracted from paper)

Replacing unstructured noise with a geometry-aware, multimodal trajectory prior enables safer, more efficient, and real-time visual navigation.

visual navigation generative planning trajectory priors conditional flow matching real-time robotics multimodal planning

Problem

Generative visual navigation models typically initialize from unstructured noise, causing inefficient refinement, poor handling of perceptual ambiguity, and compromised safety for real-time robotics.

Approach

StepNav learns a success probability field from visual features to extract a structured, multimodal mixture of candidate paths, which initializes a regularized conditional flow-matching process for rapid trajectory refinement.

Key results

95% success rate and lowest collision rate on indoor benchmarks
8.5 Hz real-time inference on edge hardware by reducing integration steps to 5
Effective resolution of ambiguous junctions via multi-modal path candidates
Superior path efficiency and smoothness compared to state-of-the-art planners

Why it matters

Provides a practical, high-performance planning framework for autonomous robots navigating complex, uncertain environments in real time.

Abstract

Visual navigation is fundamental to autonomous systems, yet generating reliable trajectories in cluttered and uncertain environments remains a core challenge. Recent gen- erative models promise end-to-end synthesis, but their reliance on unstructured noise priors often yields unsafe, inefficient, or unimodal plans that cannot meet real-time requirements. We propose StepNav, a novel framework that bridges this gap by introducing structured, multimodal trajectory priors derived from variational principles. StepNav first learns a geometry- aware success probability field to identify all feasible navigation corridors. These corridors are then used to construct an explicit, multi-modal mixture prior that initializes a conditional flow- matching process. This refinement is formulated as an optimal control problem with explicit smoothness and safety regulariza- tion. By replacing unstructured noise with physically-grounded candidates, StepNav generates safer and more efficient plans in significantly fewer steps. Experiments in both simulation and real-world benchmarks demonstrate consistent improvements in robustness, efficiency, and safety over state-of-the-art gen- erative planners, advancing reliable trajectory generation for practical autonomous navigation. The code has been released at https://github.com/LuoXubo/StepNav.

Index terms

Vision-Based Navigation Autonomous Vehicle Navigation Planning under Uncertainty