← Back ICRA 2026

DropClick: Semi-Automated One-Click Segmentation for Agricultural Robotic Data

Patrick Zimmer, Michael Allan Halstead, Christopher Steven McCool

PDF

AI summary

Key figure (auto-extracted from paper)

DropClick enables accurate, semi-automated instance segmentation for agricultural datasets using only a few training images and partial user clicks.

semi-automated segmentation one-click annotation agricultural robotics pseudo-labeling instance segmentation transformer networks

Problem

Manual segmentation annotation is costly and time-consuming, with existing click-based methods requiring user input for every object in a scene.

Approach

DropClick uses a transformer network trained on minimal labeled data to predict segmentation masks for both clicked and unclicked objects in a single pass.

Key results

Trained on just 5 hand-annotated images per dataset
Achieves mIoU of 70.0 and 72.6 on SB20 and BUP20 datasets
Maintains high accuracy with 50% missing clicks
Reduces user input by 31.9–46.3% while preserving downstream detection performance

Why it matters

It drastically cuts annotation costs and accelerates the deployment of vision-based robotic systems in precision agriculture.

Abstract

Labelling vision datasets, especially for segmentation tasks, is a laborious and costly process that stymies novel developments in agricultural robotics. In this paper, we present DropClick, a click-guided segmentation tool that simplifies the annotation process. Our system utilises single-click inputs on objects to generate pseudo-labels, which can replace manual annotations. DropClick stands out as it is a semi-automated approach and does not require a click for every object in the scene. It can therefore further reduce the required amount of user input drastically. We evaluate our method on two challenging agricultural robotic datasets, SB20 and BUP20 for plant and fruit segmentation, respectively. DropClick is first trained on a small subset of just 5 images from the original training data. This DropClick model can then be deployed as a one- click segmentation system and achieves comparable or higher performance than other one-click methods achieving an mIoU of 70.0 and 72.6 points, for SB20 and BUP20 respectively. DropClick then excels at maintaining high performance when clicks are not given (e.g. dropped); when 50% of the clicks are missing it still maintains an mIoU of 68.9 and 71.3 points, for SB20 and BUP20 respectively. We validate DropClick as a pseudo-labelling approach by taking its outputs to train a Mask2Former instance-based segmentation model in a semi- supervised manner. In this process, partially removing user input from DropClick yields similar high performance when compared to providing all clicks, at 70.1 vs 70.7 points AP50 for SB20 and no difference for BUP20 at 77.0 for both models; at the same time saving 46.3% of total input for SB20 and 31.9% for BUP20.

Index terms

Robotics and Automation in Agriculture and Forestry Agricultural Automation Computer Vision for Automation