An Annotation-To-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots
Dimitrios Chatziparaschis, Elia Scudiero, Brent Sams, Konstantinos Karydis
AI summary
Problem
Autonomous agricultural robots struggle with reliable, real-time plant detection in dynamic, unstructured field environments, especially when large manually labeled datasets are unavailable.
Approach
The system uses a frozen semantic annotator and LiDAR to generate cross-modal pseudo-labels, which are iteratively refined through a multi-stage training pipeline to build a robust YOLOv10-based detector with minimal human intervention.
Key results
- 0.83 precision and 0.53 recall after iterative training
- Over 70% tree detection rate with under 0.37 m mean localization error
- Generation of globally georeferenced, multi-modal sparse point maps
- Robust detection across diverse lighting and crop densities
Why it matters
Enables scalable, low-cost deployment of autonomous agricultural robots for precision farming without relying on extensive manual data collection.
Abstract
The dynamic and heterogeneous nature of agricul- tural fields presents significant challenges for object detection and localization, particularly for autonomous mobile robots that are tasked with surveying previously unseen unstruc- tured environments. Concurrently, there is a growing need for real-time detection systems that do not depend on large- scale manually labeled real-world datasets. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data. The proposed methodology incorporates cross-modal annotation transfer and an early- stage sensor fusion pipeline, which, in conjunction with a multi-stage detection architecture, effectively trains and en- hances the system’s multi-modal detection capabilities. The effectiveness of the framework was demonstrated through vine trunk detection in novel vineyard settings that featured diverse lighting conditions and varying crop densities to validate performance. When integrated with a customized multi-modal LiDAR and Odometry Mapping (LOAM) algorithm and a tree association module, the system demonstrated high-performance trunk localization, successfully identifying over 70% of trees in a single traversal with a mean distance error of less than 0.37 m. The results reveal that by leveraging multi-modal, incremental- stage annotation and training, the proposed framework achieves robust detection performance regardless of limited starting annotations, showcasing its potential for real-world and near- ground agricultural applications.