← Back IROS 2024

MG-VLN: Benchmarking Multi-Goal and Long-Horizon Vision-Language Navigation with Language Enhanced Memory Map

Junbo Zhang, Kaisheng Ma

PDF

Abstract

Vision-Language Navigation (VLN) with high- level language instructions is a crucial task in robotics. Existing VLN benchmarks, such as the REVERIE challenge which has single-goal instructions and limited navigation steps, do not fully encapsulate the complexity of real-world navigation that often require multi-objective and long-horizon navigation. To address this, we propose a new benchmark task: Multi-Goal and Long- Horizon Vision-Language Navigation (MG-VLN), extending the REVERIE benchmark to encompass multi-objective and long-horizon navigation scenarios with sequences of high-level instructions. This task aims to provide a simulation benchmark to guide the design of lifelong and long-horizon navigation robots. To initiate the exploration in this newly proposed task, we first investigate the role of long-term memory in improving navigation performance by leveraging environmental information gathered during previous sub-goals. Additionally, we examine the types of knowledge that most effectively enrich this long-term memory. Specifically, we integrate the visual contents with linguistic knowledge such as object categories, visual captions, and object attributes/relationships. Our findings indicate that: 1) the explicit long-term memory map signif- icantly enhances navigation performance in multi-goal and long-horizon scenarios; 2) incorporating object attributes and relationships information is the most advantageous for aligning environmental cues with high-level instructions.

Index terms

Vision-Based Navigation AI-Based Methods Visual Learning