DropClick: Semi-Automated One-Click Segmentation for Agricultural Robotic Data
Patrick Zimmer, Michael Allan Halstead, Christopher Steven McCool
AI summary
Problem
Manual segmentation annotation is costly and time-consuming, with existing click-based methods requiring user input for every object in a scene.
Approach
DropClick uses a transformer network trained on minimal labeled data to predict segmentation masks for both clicked and unclicked objects in a single pass.
Key results
- Trained on just 5 hand-annotated images per dataset
- Achieves mIoU of 70.0 and 72.6 on SB20 and BUP20 datasets
- Maintains high accuracy with 50% missing clicks
- Reduces user input by 31.9–46.3% while preserving downstream detection performance
Why it matters
It drastically cuts annotation costs and accelerates the deployment of vision-based robotic systems in precision agriculture.
Abstract
Labelling vision datasets, especially for segmentation tasks, is a laborious and costly process that stymies novel developments in agricultural robotics. In this paper, we present DropClick, a click-guided segmentation tool that simplifies the annotation process. Our system utilises single-click inputs on objects to generate pseudo-labels, which can replace manual annotations. DropClick stands out as it is a semi-automated approach and does not require a click for every object in the scene. It can therefore further reduce the required amount of user input drastically. We evaluate our method on two challenging agricultural robotic datasets, SB20 and BUP20 for plant and fruit segmentation, respectively. DropClick is first trained on a small subset of just 5 images from the original training data. This DropClick model can then be deployed as a one- click segmentation system and achieves comparable or higher performance than other one-click methods achieving an mIoU of 70.0 and 72.6 points, for SB20 and BUP20 respectively. DropClick then excels at maintaining high performance when clicks are not given (e.g. dropped); when 50% of the clicks are missing it still maintains an mIoU of 68.9 and 71.3 points, for SB20 and BUP20 respectively. We validate DropClick as a pseudo-labelling approach by taking its outputs to train a Mask2Former instance-based segmentation model in a semi- supervised manner. In this process, partially removing user input from DropClick yields similar high performance when compared to providing all clicks, at 70.1 vs 70.7 points AP50 for SB20 and no difference for BUP20 at 77.0 for both models; at the same time saving 46.3% of total input for SB20 and 31.9% for BUP20.