Global Planning for Object Navigation Via a Weighted Traveling Repairman Problem Formulation
Ruimeng Liu, Xinhang Xu, Shenghai Yuan, Lihua Xie
AI summary
Problem
Zero-shot object navigation requires agents to find targets using only natural language without prior knowledge, but current methods rely on greedy planning and 2D maps that fail in complex, occluded environments.
Approach
The authors propose WTRP-Searcher, which uses a vision-language model to score potential viewpoints and solves a Weighted Traveling Repairman Problem to globally optimize the search path, dynamically updating goals with an open-vocabulary detector and 3D feature map.
Key results
- Proposes WTRP-Searcher, a training-free framework integrating open-vocabulary detection, 3D mapping, and WTRP global planning
- Introduces a global goal selection policy that replaces greedy strategies with optimized weighted viewpoint tours
- Develops a real-time multi-modal mapping system for efficient object localization and environmental memory
- Demonstrates superior performance over state-of-the-art baselines in both simulated and real-world environments
Why it matters
Provides a scalable, training-free navigation strategy that improves search efficiency and robustness for real-world robotic applications in complex environments.
Abstract
Zero-Shot Object Navigation (ZSON) requires agents to navigate to objects specified via open-ended natural language without predefined categories or prior environmental knowledge. While recent methods leverage foundation models or multi-modal maps, they often rely on 2D representations and greedy strategies or require additional training or modules with high computation load, limiting performance in com- plex environments and real applications. We propose WTRP- Searcher, a novel framework that formulates ZSON as a Weighted Traveling Repairman Problem (WTRP), minimizing the weighted waiting time of viewpoints. Using a Vision- Language Model (VLM), we score viewpoints based on object- description similarity, projected onto a 2D map with depth information. An open-vocabulary detector identifies targets, dynamically updating goals, while a 3D embedding feature map enhances spatial awareness and environmental recall. WTRP-Searcher outperforms existing methods, offering efficient global planning and improved performance in complex ZSON tasks. Code and demos will be available on https://github.com/ lrm20011/WTRP Searcher. This work is supported by the National Research Foundation of Sin- gapore under its Medium-Sized Center for Advanced Robotics Tech- nology Innovation. This work was also supported by Chery Interna- tional under its collaboration projects with Nanyang Technological Uni- versity. All authors are with the Centre for Advanced Robotics Tech- nology Innovation, School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Email:{ruimeng.liu,xinhang.xu,shyuan,elhxie}@ntu.edu.sg.∗Corresponding.