Visual Scene Understanding-Based Task Planning for an Efficient Multipurpose Agricultural Robot System
Yonghyun Park, Hyoung Il Son
AI summary
Problem
Current agricultural robots rely on isolated object detection or uniform obstacle avoidance, lacking the contextual awareness to interpret inter-object relationships and physical attributes required for efficient, multi-task operations in unstructured farm environments.
Approach
A visual scene understanding pipeline that detects objects and predicts their relationships and attributes (e.g., rigidity, ripeness) to build a scene graph, which drives a rule-based planner to choose tasks and cooperative dual-arm strategies.
Key results
- 38.9% relationship and 70.1% attribute R@50 on custom dataset
- 72.3% task-decision and 53.7% cooperative-control accuracy
- Dual-arm selection twice as sensitive to perception errors as task assignment
- Crop-adaptable framework distinguishing flexible from rigid obstacles
Why it matters
Provides a scalable perception-planning framework that enables agricultural robots to execute complex, adaptive field operations, directly addressing labor shortages and automation inefficiencies in smart farming.
Abstract
This study introduces a visual scene understanding (VSU) pipeline that fuses scene graph generation (SGG) with task planning for agricultural robots. Mask R-CNN detects fruits, leaves, and stems; Object features feed heads for predicates and attributes such as rigidity and ripeness. The resulting graph triggers a rule-based planner that chooses among har- vesting, pruning, or thinning and decides on single- or dual- arm execution. Evaluated on a re-annotated custom dataset, the full pipeline reaches 38.9% relationship R@50, 70.1% attribute R@50, 72.3% task-decision accuracy, and 53.7% cooperative- control accuracy. Results show dual-arm selection is twice as sensitive to perception errors as task type assignment. The work provides an agriculture-specific task planning that distinguishes flexible from rigid obstacles, demonstrating that relational and attribute improve perception in agricultural scenes.