← Back ICRA 2026

LoGoPlanner: Localization Grounded Navigation Policy with Metric-Aware Visual Geometry

Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, Jiangmiao Pang

PDF

AI summary

Key figure (auto-extracted from paper)

LoGoPlanner achieves fully end-to-end navigation by integrating implicit localization and metric-aware geometry into a diffusion policy, outperforming oracle-localization baselines by over 27.3%.

End-to-end navigation implicit localization metric geometry diffusion policy robotic navigation visual geometry

Problem

Existing end-to-end navigation methods still rely on explicit localization modules requiring precise sensor calibration, limiting generalization across robots and environments, while traditional modular pipelines suffer from cascading errors and latency.

Approach

The framework fine-tunes a long-horizon visual-geometry backbone to implicitly estimate metric-scale state and reconstruct surrounding scene geometry, then conditions a diffusion-based policy on these implicit features for direct trajectory generation without explicit calibration.

Key results

27.3% improvement over oracle-localization baselines in simulation
Robust cross-embodiment and cross-environment generalization in real-world tests
Fully end-to-end design reduces cumulative error and planning latency
Metric-aware geometry memory enhances obstacle avoidance and planning consistency

Why it matters

Enables reliable, calibration-free autonomous navigation for diverse robotic platforms in unstructured environments by unifying perception, localization, and planning.

Abstract

Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw vi- sual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing im- plicit state estimation for accurate localization; (2) reconstruct- ing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation. We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end- to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the project page.

Index terms

Vision-Based Navigation Motion and Path Planning RGB-D Perception