GPT-PDDL: Towards Executable Robot Task Planning
Chang Sik Lee, Hye-Kyung Cho, SUJEONG YOU
AI summary
Problem
Traditional LLM-based planning for robot tasks from videos suffers from hallucinations and unverified physical feasibility, while existing methods rely on manual PDDL definitions or fixed action primitives. Robots struggle to generalize task plans from human demonstrations without time-consuming manual programming or sensor tracking.
Approach
GPT-PDDL processes human demonstration videos using a Vision-Language Model to extract mid-level action sequences, then automatically maps these natural language descriptions to PDDL templates to generate and verify executable robot plans using a classical planner.
Key results
- Automated extraction of mid-level action sequences from human demonstration videos
- Automatic mapping of natural language actions to PDDL schemas with object type classification
- Verification of plan executability and physical feasibility via classical PDDL solvers
- Successful task planning on five benchmark tasks from the RH20T dataset
Why it matters
Enables robots to learn complex tasks directly from human videos with high reliability, reducing manual programming and bridging the gap between semantic understanding and physical execution.
Abstract
Given the recent significant advancements in the video understanding capabilities of Large Language Models (LLMs), there is growing interest in research that automatically generates executable robot task plans from human demonstration videos. Existing LLM-based symbolic planning approaches often rely on manually defined Problem Domain Definition Language (PDDL) domains or fixed action primitives. This paper proposes GPT-PDDL, a framework that infers step- by-step task procedures from demonstration videos and converts them into robot plans based on PDDL.