ReThinkNav: Zero-Shot Vision-And-Language Navigation with Open-Source LLMs Via Contextual Reasoning and Loop Recovery
Aolin Li, Yixian Yan, Hongkun Luo, Jiao Zhan, Chi Guo
AI summary
Problem
Existing open-source LLM navigators for continuous environments struggle with inaccurate instruction following and frequently fall into spatial or semantic loops.
Approach
The framework enhances instruction comprehension through goal-oriented action decomposition and progress tracking, while a dedicated loop detection module identifies revisit behaviors and prompts the LLM to re-plan alternative paths.
Key results
- State-of-the-art zero-shot success rates on R2R-CE benchmark
- Successful real-world deployment on a Unitree G1 humanoid robot
- Improved instruction following via goal-oriented action decomposition
- Effective loop escape using geometric pose consistency and CLIP similarity
Why it matters
Provides a scalable, privacy-preserving solution for deploying autonomous navigation in real-world robotic applications without task-specific training.
Abstract
Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to follow natural language instructions and navigate without task-specific training. Prior works have demonstrated the potential of open- source large language models (LLMs) in zero-shot VLN-CE, yet two major limitations remain: (1) difficulty in accurately follow- ing instructions, and (2) susceptibility to loops in spatially con- fined or semantically similar regions. In this work, we introduce ReThinkNav, a framework designed to further advance open- source LLMs in zero-shot VLN-CE. ReThinkNav integrates contextual reasoning for enhanced instruction comprehension and progress estimation, enabling the LLM to accurately infer both the appropriate action and its rationale. In addition, a Loop Detection and Recovery (LDR) module detects loops and adjusts decisions accordingly. Experiments on the R2R- CE benchmark demonstrate excellent zero-shot performance, while real-world validation on the Unitree G1 humanoid robot confirms its practical applicability. The code is available at https://github.com/damonds27/ReThinkNav.