← Back ICRA 2026

Language Enabled Hierarchical Scene Graphs for Precision Agriculture Autonomy

Adam Mukuddem, John Adam Speed-Andrews, Paul Amayo

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating enumerated 3D scene graphs with large language models and a Visual Teach and Repeat paradigm enables precise natural language navigation for agricultural robots in homogeneous crop rows.

Natural language navigation 3D scene graphs Large language models Visual teach and repeat Precision agriculture Robot autonomy

Problem

Agricultural environments are highly homogeneous, making it difficult for robots to accurately ground natural language instructions and navigate compared to structured indoor settings. Existing methods often rely on complex navigation stacks or lack spatial awareness in uniform fields.

Approach

The authors adapt a 3D scene graph with a novel enumeration addressing system to uniquely identify plants, rows, and lines, then use an LLM to ground natural language commands to these graph nodes and generate plans for a Visual Teach and Repeat navigation framework.

Key results

Addressing system for homogeneous agricultural environments
LLM and 3D scene graph integration for natural language grounding
Teach and repeat framework for autonomy without complex planning
Real-world validation on agricultural farm data and on-field experiments

Why it matters

Enables intuitive human-robot collaboration in precision agriculture by allowing operators to issue natural language commands to farm robots without requiring extensive prior mapping or complex navigation software.

Abstract

The focus on human-robot collaboration has emerged as a pivotal area in the advancement of precision agri- cultural systems [1]. This strategy exploits the distinct strengths of both humans and robots while minimising the exertion of each [2]. A central aim within human-robot collaboration is to create robotic systems that are capable of understanding instructions given in natural language. Agricultural settings, especially those with structured rows of crops, are characteris- tically uniform, presenting difficulties in accurately grounding instructions and navigating the space. In this paper, we establish a systematic method for robotic platforms operating within agricultural settings to recognize natural language directives and autonomously traverse toward specified targets, gather- ing data en route. We advance the 3D Scene graph model introduced in Osiris [3], adapting it to support autonomy through a Visual Teach and Repeat paradigm, which does not rely on an expansive navigation stack. Additionally, we exploit large language models to correctly ground instructions within the newly constructed 3D scene graph representation, thus enabling natural language directives to be relayed to robotic systems in agricultural contexts. The system’s ability to interpret and execute natural language commands is confirmed through validation and evaluation in a practical agricultural scenario via a ground robot.

Index terms

Robotics and Automation in Agriculture and Forestry Field Robots