← Back IROS 2024

A Language-Driven Navigation Strategy Integrating Semantic Maps and Large Language Models

Zhengjun Zhong, Ying He, Pengteng Li, Fei Yu, Fei Ma

PDF

Abstract

Accurate perception of semantic and spatial in- formation is crucial for robots performing language-driven navigation tasks. Existing approaches utilize visual-language models to extract semantic information from the environment and construct maps. However, constrained by the generalization and accuracy of these models themselves, the constructed maps may not be accurate and comprehensive, thereby affecting the accuracy of navigation tasks. Inspired by foundational models’ outstanding classification and segmentation capabilities, this study introduces a semantic map constructed using foundational models. We leverage a foundational model to semantically seg- ment objects in the robot’s video stream and fuse semantics onto the map. Furthermore, this map is used in conjunction with large language models (LLMs) that receive natural language instructions to complete the navigation task. A substantial number of experiments in a simulated environment demonstrate that our method outperforms existing ones in language-driven navigation tasks.

Index terms

Semantic Scene Understanding AI-Enabled Robotics