Hierarchical LLM-Based Multi-Agent Framework with Prompt Optimization for Multi-Robot Task Planning
Tomoya Kawabe, Rin Takano
AI summary
Problem
Existing multi-robot planners either lack the flexibility to handle ambiguous, long-horizon natural language instructions or suffer from hallucinations and scalability bottlenecks due to centralized architectures and open-loop execution.
Approach
The framework distributes task decomposition and planning across a hierarchy of LLM agents, validates generated PDDL plans with a classical solver, and automatically refines agent prompts using textual gradients when failures occur, while sharing meta-prompts across peer agents for faster adaptation.
Key results
- Achieves 95%, 84%, and 60% success rates on compound, complex, and vague tasks on the MAT-THOR benchmark
- Outperforms the state-of-the-art LaMMA-P by up to 15 percentage points
- Demonstrates that hierarchical decomposition, textual-gradient prompt optimization, and meta-prompt sharing each substantially boost planning reliability
- Enables scalable, feedback-driven replanning for heterogeneous multi-robot teams without centralized bottlenecks
Why it matters
Enables reliable, scalable natural-language tasking for heterogeneous robot teams in real-world applications like warehouse automation and household assistance.
Abstract
Multi-robot task planning requires decomposing natural-language instructions into executable actions for hetero- geneous robot teams. Conventional Planning Domain Definition Language (PDDL) planners provide rigorous guarantees but struggle to handle ambiguous or long-horizon missions, while large language models (LLMs) can interpret instructions and propose plans but may hallucinate or produce infeasible actions. We present a hierarchical multi-agent LLM-based planner with prompt optimization: an upper layer decomposes tasks and assigns them to lower-layer agents, which generate PDDL prob- lems solved by a classical planner. When plans fail, the system applies TextGrad-inspired textual-gradient updates to optimize each agent’s prompt and thereby improve planning accuracy. In addition, meta-prompts are learned and shared across agents within the same layer, enabling efficient prompt optimization in multi-agent settings. On the MAT-THOR benchmark, our planner achieves success rates of 0.95 on compound tasks, 0.84 on complex tasks, and 0.60 on vague tasks, improving over the previous state-of-the-art LaMMA-P by 2, 7, and 15 percentage points respectively. An ablation study shows that the hierarchical structure, prompt optimization, and meta-prompt sharing contribute roughly +59, +37, and +4 percentage points to the overall success rate.