ApexNav: An Adaptive Exploration Strategy for Zero-Shot Object Navigation with Target-Centric Semantic Fusion
Mingjie Zhang, Yuheng Du, Chengkai Wu, Jinni ZHOU, Zhenchao Qi, Jun Ma, Boyu Zhou
AI summary
Problem
Current zero-shot object navigation methods struggle with inefficient exploration in weakly semantic environments and unreliable target identification due to noisy single-frame detections or rigid fusion strategies.
Approach
The framework dynamically switches between semantic reasoning and geometry-based exploration based on environmental cue strength, while using a target-centric fusion method to accumulate multi-frame evidence for reliable object identification.
Key results
- Adaptive exploration strategy that switches between semantic and geometric modes based on cue distribution
- Target-centric semantic fusion that aggregates multi-frame detections with confidence weighting
- State-of-the-art zero-shot navigation performance on HM3Dv1, HM3Dv2, and MP3D datasets
- Successful real-world deployment validating sim-to-real transfer
Why it matters
Advances practical autonomous navigation for search and rescue and service robots by enabling reliable, efficient target search in complex, unknown environments.
Abstract
Navigating unknown environments to find a target object is a significant challenge. While semantic information is crucial for navigation, relying solely on it for decision-making may not always be efficient, especially in environments with weak semantic cues. Additionally, many methods are susceptible to misdetections, especially in environments with visually similar objects. To address these limitations, we propose ApexNav, a zero-shot object navigation framework that is both more efficient and reliable. For efficiency, ApexNav adaptively utilizes semantic information by analyzing its distribution in the environment, guiding exploration through semantic reasoning when cues are strong, and switching to geometry-based exploration when they are weak. For reliability, we propose a target-centric semantic fusion method that preserves long-term memory of the target and similar objects, enabling robust object identification even under noisy detections. We evaluate ApexNav on the HM3Dv1, HM3Dv2, and MP3D datasets, where it outperforms state-of- the-art methods in both SR and SPL metrics. Comprehensive ablation studies further demonstrate the effectiveness of each module. Furthermore, real-world experiments validate the prac- ticality of ApexNav in physical environments. The code will be released at https://github.com/Robotics-STAR-Lab/ApexNav.