← Back ICRA 2026

DÃ©jÃ Vu: Unlocking Transparent Action Reasoning for Object-Goal Navigation Via Large Language Models

Heming Du, Sen Wang, Li Xue, Yu Xin

PDF

AI summary

Key figure (auto-extracted from paper)

Off-the-shelf LLMs can directly and transparently navigate to object goals in unseen environments without fine-tuning by reflecting on retrieved expert and trial-and-error experiences.

Object-Goal Navigation Large Language Models Embodied AI Action Reasoning In-Context Learning iTHOR

Problem

Current methods restrict LLMs to proxy roles, forcing separate modules to handle action selection and causing non-transparent decisions. This paper asks whether LLMs can act as the central brain for direct, explainable navigation in object-goal tasks.

Approach

The Experience-aware Action Cogitator (ExAC) retrieves relevant expert demonstrations and past mistakes based on current observations, injecting them into LLM prompts to guide direct action selection and reasoning without fine-tuning.

Key results

Nearly doubles vanilla LLM success rate (73.93%) and SPL (48.35%) in unseen iTHOR scenes
Consistently boosts navigation performance across LLaMA, Mistral, and Qwen models without task-specific training
Enables real-time generation of both optimal navigation actions and transparent human-readable explanations
Introduces a phase-aware experience retrieval mechanism that mimics human déjà vu for searching and approaching

Why it matters

It proves that general-purpose LLMs can serve as transparent, adaptable navigation policies for embodied AI, eliminating the need for costly fine-tuning or separate action modules.

Abstract

The remarkable interaction and reasoning ca- pabilities of Large Language Models (LLMs) make them promising in collaborative Embodied AI tasks, particularly for Object-goal Navigation (ObjNav) tasks that require both decision-making and transparent explanation. However, existing work mainly uses LLMs as proxy target indicators, leaving the role of direct action decision-making to other components. This separation causes non-transparent action decisions and extra adaptation requirements. This observation prompts us to reconsider their role: Can LLMs be transformed into the central “brain” of agents, directly outputting action choices and explaining their reasoning? In pursuit of this inquiry, we decouple perception from action reasoning to focus specifically on the feasibility of deploying LLMs as navigation policies. We introduce the Experience-aware Action Cogitator (ExAC) that integrates two kinds of experience, i.e., expert-informed experience and trial & error experience, into prompts. Inspired by David Hume’s philosophical principles that knowledge is acquired through reflective experience, these experiences are designed for two critical questions: (i) “What action should be selected as the best option?” and (ii) “What actions have been tried but proven suboptimal?” By analyzing and reflecting on these two types of experience, we show that LLMs can reason navigation actions in unseen environments effectively without costly fine-tuning. Experiments on the widely-adopted iTHOR yield significant improvements in ObjNav performance. These compelling results validate the feasibility of our ExAC. Com- pared to vanilla LLMs, ExAC nearly doubles both the Success Rate and the Success weighted by Path Length, reaching peak values of 73.93% and 48.35% in unseen scenes, respectively.

Index terms

Learning from Experience