← Back ICRA 2026

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

Yoonwoo Kim, Raghav Arora, Roberto MartÃn-MartÃn, Peter Stone, Ben Abbatematteo, Yoonchang Sung

PDF

AI summary

Key figure (auto-extracted from paper)

Incorporating LLM-derived common-sense priors and object co-location cues into belief-space planning reduces robot planning and execution time by over 60% in simulation and real-world tests.

Partially observable planning Large language models Task and motion planning Belief-space planning Common-sense reasoning Robot state estimation

Problem

Deterministic task and motion planners fail in partially observable environments when objects are occluded or unseen, as they lack mechanisms to reason under uncertainty or leverage task-irrelevant observations.

Approach

CoCo-TAMP leverages large language models to generate initial location priors and semantic similarity-based co-location models, which guide a hierarchical Bayesian filter to continuously update object beliefs during planning and execution.

Key results

62.7% reduction in planning and execution time in simulation
72.6% reduction in planning and execution time in real-world demonstrations
Hierarchical Bayesian filter with visibility-aware observation models accurately tracks object states under partial coverage
Dynamic co-location toggler improves robustness by disabling similarity updates for widely dispersed objects

Why it matters

Provides a practical pathway for robots to efficiently execute long-horizon manipulation tasks in cluttered, real-world environments by effectively bridging LLM common sense with probabilistic planning.

Abstract

Robot planning in partially observable environ- ments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7% in planning and execution time in simulation, and 72.6% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

Index terms

Task and Motion Planning