Language Enabled Hierarchical Scene Graphs for Precision Agriculture Autonomy
Adam Mukuddem, John Adam Speed-Andrews, Paul Amayo
AI summary
Problem
Agricultural environments are highly homogeneous, making it difficult for robots to accurately ground natural language instructions and navigate compared to structured indoor settings. Existing methods often rely on complex navigation stacks or lack spatial awareness in uniform fields.
Approach
The authors adapt a 3D scene graph with a novel enumeration addressing system to uniquely identify plants, rows, and lines, then use an LLM to ground natural language commands to these graph nodes and generate plans for a Visual Teach and Repeat navigation framework.
Key results
- Addressing system for homogeneous agricultural environments
- LLM and 3D scene graph integration for natural language grounding
- Teach and repeat framework for autonomy without complex planning
- Real-world validation on agricultural farm data and on-field experiments
Why it matters
Enables intuitive human-robot collaboration in precision agriculture by allowing operators to issue natural language commands to farm robots without requiring extensive prior mapping or complex navigation software.
Abstract
The focus on human-robot collaboration has emerged as a pivotal area in the advancement of precision agri- cultural systems [1]. This strategy exploits the distinct strengths of both humans and robots while minimising the exertion of each [2]. A central aim within human-robot collaboration is to create robotic systems that are capable of understanding instructions given in natural language. Agricultural settings, especially those with structured rows of crops, are characteris- tically uniform, presenting difficulties in accurately grounding instructions and navigating the space. In this paper, we establish a systematic method for robotic platforms operating within agricultural settings to recognize natural language directives and autonomously traverse toward specified targets, gather- ing data en route. We advance the 3D Scene graph model introduced in Osiris [3], adapting it to support autonomy through a Visual Teach and Repeat paradigm, which does not rely on an expansive navigation stack. Additionally, we exploit large language models to correctly ground instructions within the newly constructed 3D scene graph representation, thus enabling natural language directives to be relayed to robotic systems in agricultural contexts. The system’s ability to interpret and execute natural language commands is confirmed through validation and evaluation in a practical agricultural scenario via a ground robot.