← Back ICRA 2026

Kinodynamic Task and Motion Planning Using VLM-Guided and Interleaved Sampling

Minseo Kwon, Young J. Kim

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating a hybrid state tree with VLM-guided backtracking and interleaved sampling drastically boosts success rates and cuts planning time for kinodynamic robotic manipulation.

Task and Motion Planning Kinodynamic Planning Vision-Language Models Hybrid State Tree Robotics Interleaved Sampling

Problem

Traditional TAMP planners suffer from computational bottlenecks in long-horizon tasks, while LLM-based methods lack the 3D spatial reasoning needed to guarantee geometric and dynamic feasibility.

Approach

The method unifies symbolic task decisions and continuous motion parameters in a hybrid search tree, validating each step with a physics simulator and using a VLM to guide exploration and recover from failures.

Key results

32.14% to 1166.67% increase in average success rates over baselines
Reduced planning time on complex manipulation tasks
VLM-guided backtracking significantly improves failure recovery
Successful real-world deployment on a physical robot

Why it matters

Provides a scalable, physically reliable planning framework for complex robotic manipulation tasks that require both high-level reasoning and dynamic feasibility.

Abstract

Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geo- metric or dynamic feasibility. We propose a kinodynamic TAMP planner based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the- shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% ∼1166.67% increased average success rates compared to traditional and LLM-based TAMP planners and reduced planning time on complex problems, with ablations further highlighting the benefits of VLM backtracking. More details are available at https://graphics.ewha.ac.kr/kinodynamicTAMP/.

Index terms

Task and Motion Planning Task Planning