← Back ICRA 2026

Visual Scene Understanding-Based Task Planning for an Efficient Multipurpose Agricultural Robot System

Yonghyun Park, Hyoung Il Son

PDF

AI summary

Key figure (auto-extracted from paper)

Fusing scene graphs with rule-based planning enables agricultural robots to dynamically select tasks and intelligently manage flexible versus rigid obstacles for efficient multi-arm execution.

agricultural robotics scene graph generation visual scene understanding task planning cooperative control crop automation

Problem

Current agricultural robots rely on isolated object detection or uniform obstacle avoidance, lacking the contextual awareness to interpret inter-object relationships and physical attributes required for efficient, multi-task operations in unstructured farm environments.

Approach

A visual scene understanding pipeline that detects objects and predicts their relationships and attributes (e.g., rigidity, ripeness) to build a scene graph, which drives a rule-based planner to choose tasks and cooperative dual-arm strategies.

Key results

38.9% relationship and 70.1% attribute R@50 on custom dataset
72.3% task-decision and 53.7% cooperative-control accuracy
Dual-arm selection twice as sensitive to perception errors as task assignment
Crop-adaptable framework distinguishing flexible from rigid obstacles

Why it matters

Provides a scalable perception-planning framework that enables agricultural robots to execute complex, adaptive field operations, directly addressing labor shortages and automation inefficiencies in smart farming.

Abstract

This study introduces a visual scene understanding (VSU) pipeline that fuses scene graph generation (SGG) with task planning for agricultural robots. Mask R-CNN detects fruits, leaves, and stems; Object features feed heads for predicates and attributes such as rigidity and ripeness. The resulting graph triggers a rule-based planner that chooses among har- vesting, pruning, or thinning and decides on single- or dual- arm execution. Evaluated on a re-annotated custom dataset, the full pipeline reaches 38.9% relationship R@50, 70.1% attribute R@50, 72.3% task-decision accuracy, and 53.7% cooperative- control accuracy. Results show dual-arm selection is twice as sensitive to perception errors as task type assignment. The work provides an agriculture-specific task planning that distinguishes flexible from rigid obstacles, demonstrating that relational and attribute improve perception in agricultural scenes.

Index terms

Robotics and Automation in Agriculture and Forestry Agricultural Automation Robotics and Automation in Life Sciences