Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
Yoonwoo Kim, Raghav Arora, Roberto MartÃn-MartÃn, Peter Stone, Ben Abbatematteo, Yoonchang Sung
AI summary
Problem
Deterministic task and motion planners fail in partially observable environments when objects are occluded or unseen, as they lack mechanisms to reason under uncertainty or leverage task-irrelevant observations.
Approach
CoCo-TAMP leverages large language models to generate initial location priors and semantic similarity-based co-location models, which guide a hierarchical Bayesian filter to continuously update object beliefs during planning and execution.
Key results
- 62.7% reduction in planning and execution time in simulation
- 72.6% reduction in planning and execution time in real-world demonstrations
- Hierarchical Bayesian filter with visibility-aware observation models accurately tracks object states under partial coverage
- Dynamic co-location toggler improves robustness by disabling similarity updates for widely dispersed objects
Why it matters
Provides a practical pathway for robots to efficiently execute long-horizon manipulation tasks in cluttered, real-world environments by effectively bridging LLM common sense with probabilistic planning.
Abstract
Robot planning in partially observable environ- ments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7% in planning and execution time in simulation, and 72.6% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.