← Back ICRA 2026

Commonsense-Guided Object Graph Reasoning with Policy Regularization for Object Goal Navigation

Yiyue Meng, Aolin Li, Jiao Zhan, Shenxin Li, Chi Guo

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating LLM-derived commonsense knowledge with policy regularization significantly boosts navigation success and robustness in unseen environments.

Object Goal Navigation Commonsense Reasoning Large Language Models Policy Regularization Embodied AI Reinforcement Learning

Problem

Agents in object goal navigation fail to generalize in unseen environments because static object graphs rely on limited training data, leading to incomplete scene understanding and poor robustness.

Approach

The method injects LLM-generated co-occurrence embeddings into object graphs to guide reasoning, while using a commonsense-free model to regularize the navigation policy and prevent knowledge bias.

Key results

COGR extracts LLM co-occurrence embeddings to extend object graph reasoning beyond training data
PR stabilizes training via soft policy regularization, mitigating LLM knowledge bias
Achieves highest success rate and SPL on AI2-Thor and RoboThor benchmarks
Validated through successful real-world robotic deployment

Why it matters

Provides a robust, generalizable navigation framework that bridges simulated training with real-world physical environments for embodied AI.

Abstract

Object goal navigation aims to guide an agent to find a specific target object in an unseen environment using only first-person visual observations. It requires the agent to enhance scene understanding and train a robust navigation policy. To address this, we proposed two complementary techniques, commonsense-guided object graph reasoning (COGR) and policy regularization (PR). Specifically, COGR improves the agent’s scene understanding by integrating object relationships, including category proximity and spatial correlation. It extracts co-occurrence embeddings of the target object from a large language model (LLM) as commonsense knowledge to guide object graph reasoning, enabling the agent to reason beyond visual co-occurrence observed in training environments. PR is a knowledge distillation-inspired regularization mechanism, where a commonsense-free model is used to regularize the nav- igation policy of the commonsense-guided model. We propose PR to mitigate potential performance degradation caused by knowledge bias from the LLM, enabling the training of a more robust navigation policy. Experiments in the AI2-Thor and RoboThor environments demonstrate the effectiveness and efficiency of our proposed method, and real-world deployment further validates its transferability.

Index terms

Vision-Based Navigation Reinforcement Learning Representation Learning