Research Analyzer
← Back ICRA 2026

GPT-PDDL: Towards Executable Robot Task Planning

Chang Sik Lee, Hye-Kyung Cho, SUJEONG YOU

PDF

AI summary

Key figure (auto-extracted from paper)
Integrating LLM video understanding with PDDL planning converts human demonstration videos directly into physically feasible, executable robot task plans while mitigating LLM hallucinations.
LLM PDDL Robot Task Planning Human Demonstration Vision-Language Models Plan Verification

Problem

Traditional LLM-based planning for robot tasks from videos suffers from hallucinations and unverified physical feasibility, while existing methods rely on manual PDDL definitions or fixed action primitives. Robots struggle to generalize task plans from human demonstrations without time-consuming manual programming or sensor tracking.

Approach

GPT-PDDL processes human demonstration videos using a Vision-Language Model to extract mid-level action sequences, then automatically maps these natural language descriptions to PDDL templates to generate and verify executable robot plans using a classical planner.

Key results

  • Automated extraction of mid-level action sequences from human demonstration videos
  • Automatic mapping of natural language actions to PDDL schemas with object type classification
  • Verification of plan executability and physical feasibility via classical PDDL solvers
  • Successful task planning on five benchmark tasks from the RH20T dataset

Why it matters

Enables robots to learn complex tasks directly from human videos with high reliability, reducing manual programming and bridging the gap between semantic understanding and physical execution.

Abstract

Given the recent significant advancements in the video understanding capabilities of Large Language Models (LLMs), there is growing interest in research that automatically generates executable robot task plans from human demonstration videos. Existing LLM-based symbolic planning approaches often rely on manually defined Problem Domain Definition Language (PDDL) domains or fixed action primitives. This paper proposes GPT-PDDL, a framework that infers step- by-step task procedures from demonstration videos and converts them into robot plans based on PDDL.

Index terms

AI-Based Methods Assembly Visual Learning

Related papers