Research Analyzer
← Back ICRA 2026

Hierarchical LLM-Based Multi-Agent Framework with Prompt Optimization for Multi-Robot Task Planning

Tomoya Kawabe, Rin Takano

PDF

AI summary

Key figure (auto-extracted from paper)
A hierarchical multi-agent LLM planner with iterative prompt optimization significantly outperforms existing methods on multi-robot task planning benchmarks by combining scalable decomposition with automated feedback-driven refinement.
Multi-robot planning Large language models Prompt optimization Hierarchical agents TextGrad PDDL

Problem

Existing multi-robot planners either lack the flexibility to handle ambiguous, long-horizon natural language instructions or suffer from hallucinations and scalability bottlenecks due to centralized architectures and open-loop execution.

Approach

The framework distributes task decomposition and planning across a hierarchy of LLM agents, validates generated PDDL plans with a classical solver, and automatically refines agent prompts using textual gradients when failures occur, while sharing meta-prompts across peer agents for faster adaptation.

Key results

  • Achieves 95%, 84%, and 60% success rates on compound, complex, and vague tasks on the MAT-THOR benchmark
  • Outperforms the state-of-the-art LaMMA-P by up to 15 percentage points
  • Demonstrates that hierarchical decomposition, textual-gradient prompt optimization, and meta-prompt sharing each substantially boost planning reliability
  • Enables scalable, feedback-driven replanning for heterogeneous multi-robot teams without centralized bottlenecks

Why it matters

Enables reliable, scalable natural-language tasking for heterogeneous robot teams in real-world applications like warehouse automation and household assistance.

Abstract

Multi-robot task planning requires decomposing natural-language instructions into executable actions for hetero- geneous robot teams. Conventional Planning Domain Definition Language (PDDL) planners provide rigorous guarantees but struggle to handle ambiguous or long-horizon missions, while large language models (LLMs) can interpret instructions and propose plans but may hallucinate or produce infeasible actions. We present a hierarchical multi-agent LLM-based planner with prompt optimization: an upper layer decomposes tasks and assigns them to lower-layer agents, which generate PDDL prob- lems solved by a classical planner. When plans fail, the system applies TextGrad-inspired textual-gradient updates to optimize each agent’s prompt and thereby improve planning accuracy. In addition, meta-prompts are learned and shared across agents within the same layer, enabling efficient prompt optimization in multi-agent settings. On the MAT-THOR benchmark, our planner achieves success rates of 0.95 on compound tasks, 0.84 on complex tasks, and 0.60 on vague tasks, improving over the previous state-of-the-art LaMMA-P by 2, 7, and 15 percentage points respectively. An ablation study shows that the hierarchical structure, prompt optimization, and meta-prompt sharing contribute roughly +59, +37, and +4 percentage points to the overall success rate.

Index terms

Agent-Based Systems Task Planning Multi-Robot Systems

Related papers