Commonsense-Guided Object Graph Reasoning with Policy Regularization for Object Goal Navigation
Yiyue Meng, Aolin Li, Jiao Zhan, Shenxin Li, Chi Guo
AI summary
Problem
Agents in object goal navigation fail to generalize in unseen environments because static object graphs rely on limited training data, leading to incomplete scene understanding and poor robustness.
Approach
The method injects LLM-generated co-occurrence embeddings into object graphs to guide reasoning, while using a commonsense-free model to regularize the navigation policy and prevent knowledge bias.
Key results
- COGR extracts LLM co-occurrence embeddings to extend object graph reasoning beyond training data
- PR stabilizes training via soft policy regularization, mitigating LLM knowledge bias
- Achieves highest success rate and SPL on AI2-Thor and RoboThor benchmarks
- Validated through successful real-world robotic deployment
Why it matters
Provides a robust, generalizable navigation framework that bridges simulated training with real-world physical environments for embodied AI.
Abstract
Object goal navigation aims to guide an agent to find a specific target object in an unseen environment using only first-person visual observations. It requires the agent to enhance scene understanding and train a robust navigation policy. To address this, we proposed two complementary techniques, commonsense-guided object graph reasoning (COGR) and policy regularization (PR). Specifically, COGR improves the agent’s scene understanding by integrating object relationships, including category proximity and spatial correlation. It extracts co-occurrence embeddings of the target object from a large language model (LLM) as commonsense knowledge to guide object graph reasoning, enabling the agent to reason beyond visual co-occurrence observed in training environments. PR is a knowledge distillation-inspired regularization mechanism, where a commonsense-free model is used to regularize the nav- igation policy of the commonsense-guided model. We propose PR to mitigate potential performance degradation caused by knowledge bias from the LLM, enabling the training of a more robust navigation policy. Experiments in the AI2-Thor and RoboThor environments demonstrate the effectiveness and efficiency of our proposed method, and real-world deployment further validates its transferability.