← Back ICRA 2026

An Annotation-To-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

Dimitrios Chatziparaschis, Elia Scudiero, Brent Sams, Konstantinos Karydis

PDF

AI summary

Key figure (auto-extracted from paper)

A multi-modal annotation-to-detection framework enables robust vine trunk localization in unseen vineyards using minimal initial annotations and iterative pseudo-labeling.

vine trunk detection multi-modal sensing pseudo-labeling autonomous agriculture LiDAR fusion few-shot learning

Problem

Autonomous agricultural robots struggle with reliable, real-time plant detection in dynamic, unstructured field environments, especially when large manually labeled datasets are unavailable.

Approach

The system uses a frozen semantic annotator and LiDAR to generate cross-modal pseudo-labels, which are iteratively refined through a multi-stage training pipeline to build a robust YOLOv10-based detector with minimal human intervention.

Key results

0.83 precision and 0.53 recall after iterative training
Over 70% tree detection rate with under 0.37 m mean localization error
Generation of globally georeferenced, multi-modal sparse point maps
Robust detection across diverse lighting and crop densities

Why it matters

Enables scalable, low-cost deployment of autonomous agricultural robots for precision farming without relying on extensive manual data collection.

Abstract

The dynamic and heterogeneous nature of agricul- tural fields presents significant challenges for object detection and localization, particularly for autonomous mobile robots that are tasked with surveying previously unseen unstruc- tured environments. Concurrently, there is a growing need for real-time detection systems that do not depend on large- scale manually labeled real-world datasets. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data. The proposed methodology incorporates cross-modal annotation transfer and an early- stage sensor fusion pipeline, which, in conjunction with a multi-stage detection architecture, effectively trains and en- hances the system’s multi-modal detection capabilities. The effectiveness of the framework was demonstrated through vine trunk detection in novel vineyard settings that featured diverse lighting conditions and varying crop densities to validate performance. When integrated with a customized multi-modal LiDAR and Odometry Mapping (LOAM) algorithm and a tree association module, the system demonstrated high-performance trunk localization, successfully identifying over 70% of trees in a single traversal with a mean distance error of less than 0.37 m. The results reveal that by leveraging multi-modal, incremental- stage annotation and training, the proposed framework achieves robust detection performance regardless of limited starting annotations, showcasing its potential for real-world and near- ground agricultural applications.

Index terms

Robotics and Automation in Agriculture and Forestry Agricultural Automation Field Robots