← Back ICRA 2026

ReThinkNav: Zero-Shot Vision-And-Language Navigation with Open-Source LLMs Via Contextual Reasoning and Loop Recovery

Aolin Li, Yixian Yan, Hongkun Luo, Jiao Zhan, Chi Guo

PDF

AI summary

Key figure (auto-extracted from paper)

ReThinkNav enables robust zero-shot navigation with open-source LLMs by combining goal-oriented contextual reasoning with geometric loop detection and recovery.

Vision-and-Language Navigation Zero-Shot Learning Open-Source LLMs Loop Recovery Contextual Reasoning Embodied AI

Problem

Existing open-source LLM navigators for continuous environments struggle with inaccurate instruction following and frequently fall into spatial or semantic loops.

Approach

The framework enhances instruction comprehension through goal-oriented action decomposition and progress tracking, while a dedicated loop detection module identifies revisit behaviors and prompts the LLM to re-plan alternative paths.

Key results

State-of-the-art zero-shot success rates on R2R-CE benchmark
Successful real-world deployment on a Unitree G1 humanoid robot
Improved instruction following via goal-oriented action decomposition
Effective loop escape using geometric pose consistency and CLIP similarity

Why it matters

Provides a scalable, privacy-preserving solution for deploying autonomous navigation in real-world robotic applications without task-specific training.

Abstract

Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to follow natural language instructions and navigate without task-specific training. Prior works have demonstrated the potential of open- source large language models (LLMs) in zero-shot VLN-CE, yet two major limitations remain: (1) difficulty in accurately follow- ing instructions, and (2) susceptibility to loops in spatially con- fined or semantically similar regions. In this work, we introduce ReThinkNav, a framework designed to further advance open- source LLMs in zero-shot VLN-CE. ReThinkNav integrates contextual reasoning for enhanced instruction comprehension and progress estimation, enabling the LLM to accurately infer both the appropriate action and its rationale. In addition, a Loop Detection and Recovery (LDR) module detects loops and adjusts decisions accordingly. Experiments on the R2R- CE benchmark demonstrate excellent zero-shot performance, while real-world validation on the Unitree G1 humanoid robot confirms its practical applicability. The code is available at https://github.com/damonds27/ReThinkNav.

Index terms

Vision-Based Navigation