StepNav: Structured Trajectory Priors for Efficient and Multimodal Visual Navigation
Xubo Luo, Aodi Wu, Haodong Han, Xue Wan∗, Wei Zhang, Leizheng Shu and Ruisuo Wang
AI summary
Problem
Generative visual navigation models typically initialize from unstructured noise, causing inefficient refinement, poor handling of perceptual ambiguity, and compromised safety for real-time robotics.
Approach
StepNav learns a success probability field from visual features to extract a structured, multimodal mixture of candidate paths, which initializes a regularized conditional flow-matching process for rapid trajectory refinement.
Key results
- 95% success rate and lowest collision rate on indoor benchmarks
- 8.5 Hz real-time inference on edge hardware by reducing integration steps to 5
- Effective resolution of ambiguous junctions via multi-modal path candidates
- Superior path efficiency and smoothness compared to state-of-the-art planners
Why it matters
Provides a practical, high-performance planning framework for autonomous robots navigating complex, uncertain environments in real time.
Abstract
Visual navigation is fundamental to autonomous systems, yet generating reliable trajectories in cluttered and uncertain environments remains a core challenge. Recent gen- erative models promise end-to-end synthesis, but their reliance on unstructured noise priors often yields unsafe, inefficient, or unimodal plans that cannot meet real-time requirements. We propose StepNav, a novel framework that bridges this gap by introducing structured, multimodal trajectory priors derived from variational principles. StepNav first learns a geometry- aware success probability field to identify all feasible navigation corridors. These corridors are then used to construct an explicit, multi-modal mixture prior that initializes a conditional flow- matching process. This refinement is formulated as an optimal control problem with explicit smoothness and safety regulariza- tion. By replacing unstructured noise with physically-grounded candidates, StepNav generates safer and more efficient plans in significantly fewer steps. Experiments in both simulation and real-world benchmarks demonstrate consistent improvements in robustness, efficiency, and safety over state-of-the-art gen- erative planners, advancing reliable trajectory generation for practical autonomous navigation. The code has been released at https://github.com/LuoXubo/StepNav.