Spatio-Temporal Consistent Semantic Mapping for Robotics Fruit Growth Monitoring
Cyrill Stachniss
AI summary
Problem
Automated long-term fruit monitoring requires recognizing and tracking the same individual fruits across multiple recordings despite drastic scene changes, occlusions, and plant evolution, which existing methods struggle to do consistently or require expensive hardware.
Approach
The system captures video with a mobile robot-mounted RGB-D camera, segments individual fruits per frame, and maintains consistent instance IDs across frames and sessions by matching 3D centroids and aligning new poses to a reference map.
Key results
- Real-time intra-sequence fruit instance tracking
- Cross-session fruit association across weeks of growth
- Generation of spatially aligned 3D voxel maps with persistent instance labels
- Higher accuracy and F1-scores than baselines on real greenhouse data
Why it matters
Provides a low-cost, deployable solution for robotic precision agriculture to monitor crop development and optimize yield over time.
Abstract
Automatic fruit growth monitoring plays a vital role in advancing precision agriculture. Tracking the evolution of fruits over time is essential to monitor their development and optimize production. The ability to recognize fruits over periods of time, even with drastic scene changes, is a required capability of agricultural robots. This paper presents a system that allows long-term fruit tracking in 3D data. It generates instance-segmented 3D representations of plants at various growth stages over time, utilizing only consumer-grade RGB-D cameras installed on a mobile robot. Our approach first performs instance segmentation on each image in a sequence. Then, by exploiting geometric information and depth maps, we track the same instances throughout the sequence. We produce a 3D point cloud containing instances, exploiting odometry information and 3D semantic mapping. Once our robot performs a new recording at a different plant growth stage, it associates each fruit with the previously built 3D cloud and update the model. We validate the system in a real-world glasshouse environment in Bonn, Germany. Experimental results demonstrate that our system outperforms existing baselines even though it relies only on annotated images and operates at frame-rate, allowing the deployment on a real robot.