EZREAL: Enhancing Zero-Shot Outdoor Robot Navigation Toward Distant Targets under Varying Visibility
Tianle Zeng, Jianwei Peng, Hanjing Ye, Guangcheng Chen, senzi luo, Hong Zhang
AI summary
Problem
Outdoor zero-shot navigation fails when distant targets shrink to tiny projections and suffer from intermittent occlusion, as depth-based mapping becomes infeasible and standard detectors lose reliability.
Approach
The system uses an aligned multi-scale image tile hierarchy to amplify weak semantic cues from far-field targets, then fuses historical saliency-weighted headings to maintain direction during temporary invisibility.
Key results
- Detects semantic targets beyond 150 m with sustained low angular error
- Maintains correct heading through visibility changes with 82.6% probability
- Improves overall task success by 17.5% over state-of-the-art methods
- Enables reliable target re-identification via saliency-weighted heading fusion and active search
Why it matters
Provides a practical, real-time navigation solution for outdoor exploration and search-and-rescue robots operating in unstructured, long-range environments.
Abstract
Zero-shot object navigation (ZSON) in large-scale outdoor environments faces many challenges; we specifically address a coupled one: long-range targets that reduce to tiny projections and intermittent visibility due to partial or complete occlusion. We present a unified, lightweight closed-loop system built on an aligned multi-scale image tile hierarchy. Through hierarchical target–saliency fusion, it summarizes localized se- mantic contrast into a stable coarse-layer regional saliency that provides the target direction and indicates target visibility. This regional saliency supports visibility-aware heading maintenance through keyframe memory, saliency-weighted fusion of histor- ical headings, and active search during temporary invisibility. The system avoids whole-image rescaling, enables deterministic This work was supported in part by Shenzhen Science and Technology Program (No. SGDX20240115111759002), in part by Meituan Academy of Robotics Shenzhen, in part by the Shenzhen Association for Science and Technology (No. XHXS2025-003), and in part by High level of special funds (G03034K003) from Southern University of Science and Technology, Shenzhen, China. All authors are with the Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology. Corresponding author: Hong Zhang (hzhang@sustech.edu.cn). This paper’s Figure 2 was created with the assistance of a generative AI tool. The authors directed the process and are fully responsible for the figure’s final content and scientific accuracy. bottom-up aggregation, supports zero-shot navigation, and runs efficiently on a mobile robot. Across simulation and real-world outdoor trials, the system detects semantic targets beyond 150 m, maintains a correct heading through visibility changes with 82.6% probability, and improves overall task success by 17.5% compared with the SOTA methods, demonstrating robust ZSON toward distant and intermittently observable targets. Project Page: https://tianlezeng.github.io/EzReal/