As You Wish: Mission Planning with Formal Verification Using LLMs in Precision Agriculture
Marcos Zuzuarregui, Stefano Carpin
AI summary
Problem
Natural language ambiguities cause LLM-generated robot missions to frequently misalign with user intent, while existing verification methods require specialized knowledge or constant human oversight.
Approach
The system uses two independent LLMs to generate mission plans and Linear Temporal Logic specifications separately, then employs an automated arbiter and SPIN model checker in a feedback loop to autonomously verify and correct the plans.
Key results
- Fully autonomous LLM-to-robot pipeline with formal verification
- Successful verification of LLM-generated plans without human-in-the-loop
- Real-world validation with end-users demonstrating system strengths and limitations
- Automated co-safe LTL generation and SPIN model checking for mission validation
Why it matters
Empowers non-technical agricultural operators to safely deploy complex autonomous robot missions without specialized planning expertise or continuous supervision.
Abstract
Though robotic systems are now being commer- cialized and deployed in various industries, many of these systems are highly specialized and often require an advanced skill set to operate and ensure they perform as instructed. To mitigate this problem, it has been proposed to use large language models (LLMs) to synthesize mission plans in preci- sion agriculture and other domains based on mission descrip- tions provided in natural language (NL). While these systems demonstrate impressive performance, they also suffer from the inherent ambiguities of NL. In this paper, we address this issue by introducing a planning architecture that combines LLMs with linear temporal logic (LTL) to ensure that, through formal verification, the mission planning system meets the specifications formulated by the user while still using NL. In our proposed system, the mission plan is seen as the implementation and the LTL formalization is seen as the specification. Both are automatically extracted from mission descriptions provided in NL. To mitigate potential bias, two separate LLMs are tasked with the implementation and specification generation. Through feedback loops, the system self-corrects when syntax or verification errors are encountered, thus offering a fully hands-off solution. Through extensive experiments, we highlight the strengths and limitations of integrating mission verification into a fully autonomous pipeline, particularly regarding an LLM’s ability to generate valuable LTL formulas, and show how our proposed implementation addresses and solves these challenges.