← Back ICRA 2026

VeriGraph: Scene Graphs for Execution Verifiable Robot Planning

Daniel Ekpo, Mara Levy, Saksham Suri, Chuong Huynh, Archana Swaminathan, Abhinav Shrivastava

PDF

AI summary

Key figure (auto-extracted from paper)

VeriGraph significantly improves robot task completion by using scene graphs to iteratively verify and correct VLM-generated plans.

Scene graphs Robot planning Vision-language models Execution verification Constraint-aware planning

Problem

Vision-language models frequently generate incorrect or infeasible action sequences for long-horizon manipulation tasks due to a lack of explicit spatial and physical constraint reasoning.

Approach

The framework converts initial scenes and goals into structured scene graphs, then iteratively validates and refines LLM-generated action sequences against graph-based constraints before execution.

Key results

58% improvement on language-based tasks
56% improvement on tangram puzzles
30% improvement on image-based tasks
Constraint-aware planning without human intervention

Why it matters

It enables reliable, scalable long-horizon robotic manipulation by bridging the gap between high-level VLM reasoning and physical execution constraints.

Abstract

Recent progress in vision-language models (VLMs) has opened new possibilities for robot task planning, but these models often produce incorrect action sequences. To address these limitations, we propose VeriGraph, a novel framework that integrates VLMs for robotic planning while verifying action feasibility. VeriGraph uses scene graphs as an intermediate representation to capture key objects and spatial relationships, enabling more reliable plan verification and refinement. The system generates a scene graph from input images and uses it to iteratively check and correct action sequences generated by an LLM-based task planner, ensuring constraints are respected and actions are executable. Our approach significantly enhances task completion rates across diverse manipulation scenarios, outperforming baseline methods by 58% on language-based tasks, 56% on tangram puzzle tasks, and 30% on image-based tasks. Qualitative results and code can be found at https://verigraph-agent.github.io/.

Index terms

Task and Motion Planning Visual Learning