← Back ICRA 2026

PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models

Wang Bill Zhu, Miaosen Chai, Ishika Singh, Robin Jia, Jesse Thomason

PDF

AI summary

Key figure (auto-extracted from paper)

PSALM-V autonomously induces symbolic action semantics through visual interaction, doubling plan success rates in partially observed environments without expert definitions.

neuro-symbolic planning domain induction large language models visual robotics PDDL autonomous agents

Problem

Current LLM-augmented planning methods rely on unrealistic assumptions like full observability, predefined problem files, or explicit error messages, hindering their application to real-world visual and robotic tasks.

Approach

The system iteratively samples and executes plans in visual environments, predicts execution errors, and maintains a tree-structured belief over action semantics to dynamically refine PDDL domains until a valid goal is reached.

Key results

Doubles ALFRED plan success rate from 37% to 74% under partial observability
Achieves 100% domain induction F1 score in RTFM and Overcooked-AI multi-agent settings
Recovers full BlocksWorld robot domain with 100% F1 and 66.7% goal completion
Supports complex logical connectors (and, or, when) for action semantics

Why it matters

Enables scalable, autonomous symbolic planning for embodied AI and robotics by eliminating the need for manual domain engineering and explicit environmental feedback.

Abstract

We propose PSALM-V, one of the first autonomous neuro-symbolic learning systems able to induce symbolic action semantics (i.e., pre- and post-conditions) in visual environments through interaction. PSALM-V bootstraps reliable symbolic planning without expert action definitions, using LLMs to generate heuristic plans and candidate symbolic semantics. Previous work has explored using large language models to generate action semantics for Planning Domain Definition Language (PDDL)-based symbolic planners. However, these approaches have primarily focused on text-based domains or relied on unrealistic assumptions, such as access to a predefined problem file, full observability, or explicit error messages. By contrast, PSALM-V dynamically infers PDDL problem files and domain action semantics by analyzing execution outcomes and synthesizing possible error explanations. The system iteratively generates and executes plans while maintaining a tree-structured belief over possible action semantics for each action, iteratively refining these beliefs until a goal state is reached. Simulated experiments of task completion in ALFRED demonstrate that PSALM-V increases the plan success rate from 37% (Claude-3.7) to 74% in partially observed setups. Results on two 2D game environments, RTFM and Overcooked-AI, show that PSALM-V improves step efficiency and succeeds in domain induction in multi-agent settings. PSALM-V correctly induces PDDL pre- and post-conditions for real-world robot BlocksWorld tasks, despite low-level manipulation failures from the robot. Videos and resources at https://psalmv.github.io/.

Index terms

Integrated Planning and Learning Task Planning Planning under Uncertainty