A Language-Driven Navigation Strategy Integrating Semantic Maps and Large Language Models
Zhengjun Zhong, Ying He, Pengteng Li, Fei Yu, Fei Ma
Abstract
Accurate perception of semantic and spatial in- formation is crucial for robots performing language-driven navigation tasks. Existing approaches utilize visual-language models to extract semantic information from the environment and construct maps. However, constrained by the generalization and accuracy of these models themselves, the constructed maps may not be accurate and comprehensive, thereby affecting the accuracy of navigation tasks. Inspired by foundational models’ outstanding classification and segmentation capabilities, this study introduces a semantic map constructed using foundational models. We leverage a foundational model to semantically seg- ment objects in the robot’s video stream and fuse semantics onto the map. Furthermore, this map is used in conjunction with large language models (LLMs) that receive natural language instructions to complete the navigation task. A substantial number of experiments in a simulated environment demonstrate that our method outperforms existing ones in language-driven navigation tasks.