Kinodynamic Task and Motion Planning Using VLM-Guided and Interleaved Sampling
Minseo Kwon, Young J. Kim
AI summary
Problem
Traditional TAMP planners suffer from computational bottlenecks in long-horizon tasks, while LLM-based methods lack the 3D spatial reasoning needed to guarantee geometric and dynamic feasibility.
Approach
The method unifies symbolic task decisions and continuous motion parameters in a hybrid search tree, validating each step with a physics simulator and using a VLM to guide exploration and recover from failures.
Key results
- 32.14% to 1166.67% increase in average success rates over baselines
- Reduced planning time on complex manipulation tasks
- VLM-guided backtracking significantly improves failure recovery
- Successful real-world deployment on a physical robot
Why it matters
Provides a scalable, physically reliable planning framework for complex robotic manipulation tasks that require both high-level reasoning and dynamic feasibility.
Abstract
Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geo- metric or dynamic feasibility. We propose a kinodynamic TAMP planner based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the- shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% ∼1166.67% increased average success rates compared to traditional and LLM-based TAMP planners and reduced planning time on complex problems, with ablations further highlighting the benefits of VLM backtracking. More details are available at https://graphics.ewha.ac.kr/kinodynamicTAMP/.