← Back ICRA 2026

MASt3R-Nav: WayPixel Navigation in Relative 3D Maps

predicts a trajectory rollout that guides the robot toward the target.

PDF

AI summary

Key figure (auto-extracted from paper)

Conditioning a navigation controller on a dense, pixel-level costmap derived from relative 3D geometry significantly improves trajectory prediction accuracy and robustness over traditional image- or object-level representations.

Visual navigation pixel-relative mapping WayPixel costmap MASt3R topological planning learned control

Problem

Classical 3D maps require globally consistent geometry, while image- or object-relative topological graphs sacrifice geometric understanding, limiting navigation to teach-and-repeat or coarse control. There is a need for a representation that balances geometric precision with computational feasibility without requiring global registration.

Approach

The authors construct a pixel-level topological graph using relative 3D correspondences from the MASt3R model, compute shortest-path costs to generate a dense 'WayPixel Costmap,' and train a neural controller conditioned on this fine-grained costmap to predict trajectory rollouts.

Key results

Proposes MASt3R-Nav, a topological navigation pipeline using pixel-relative connectivity
Introduces the WayPixel Costmap as a dense planning-to-control interface
Trains PixelReact, a controller that exploits fine-grained cost gradients for robust trajectory rollout
Outperforms object- and image-level baselines with an SPL of 81.77 in simulator and real-world tests

Why it matters

Provides a computationally feasible, geometrically precise navigation framework that improves robotic control robustness without requiring globally consistent 3D maps.

Abstract

Visual navigation ability is strongly tied to its underlying representation of the world. Unlike classical 3D maps that require globally-consistent geometry, image- or object-relative topological graphs almost entirely do away with geometric understanding. But, this comes at the cost of navigation capability, often limiting it to merely teach-and- repeat. In this work, we propose a novel map representation in the form of pixel-relative connectivity, which is geometrically accurate but does not require global geometric consistency. Inspired by recent progress in 3D grounded image matching, we construct a map from an image sequence through inter-image connectivity based on pixel correspondences in the relative 3D coordinate systems of individual image pairs. We then use this pixel-level graph to perform global path planning by approximating and sparsifying intra-image pixel connectivity. Through this, we derive a “WayPixel Costmap” representation and train a controller conditioned on it to predict a trajectory rollout. We show that this dense pixel-level costmap based on relative geometry is a more accurate conditioning variable for control prediction than its image- and object-level counterparts. This enables a highly capable navigation system, as validated on four types of navigation tasks in the simulator and through real world demonstrations.

Index terms

Vision-Based Navigation Integrated Planning and Learning Perception-Action Coupling