← Back ICRA 2026

Spatio-Temporal Consistent Semantic Mapping for Robotics Fruit Growth Monitoring

Cyrill Stachniss

PDF

AI summary

Key figure (auto-extracted from paper)

A real-time pipeline using consumer RGB-D cameras enables consistent long-term 3D tracking and mapping of individual fruits across different growth stages, outperforming existing baselines.

Fruit tracking 3D mapping RGB-D cameras Precision agriculture Instance segmentation Spatio-temporal consistency

Problem

Automated long-term fruit monitoring requires recognizing and tracking the same individual fruits across multiple recordings despite drastic scene changes, occlusions, and plant evolution, which existing methods struggle to do consistently or require expensive hardware.

Approach

The system captures video with a mobile robot-mounted RGB-D camera, segments individual fruits per frame, and maintains consistent instance IDs across frames and sessions by matching 3D centroids and aligning new poses to a reference map.

Key results

Real-time intra-sequence fruit instance tracking
Cross-session fruit association across weeks of growth
Generation of spatially aligned 3D voxel maps with persistent instance labels
Higher accuracy and F1-scores than baselines on real greenhouse data

Why it matters

Provides a low-cost, deployable solution for robotic precision agriculture to monitor crop development and optimize yield over time.

Abstract

Automatic fruit growth monitoring plays a vital role in advancing precision agriculture. Tracking the evolution of fruits over time is essential to monitor their development and optimize production. The ability to recognize fruits over periods of time, even with drastic scene changes, is a required capability of agricultural robots. This paper presents a system that allows long-term fruit tracking in 3D data. It generates instance-segmented 3D representations of plants at various growth stages over time, utilizing only consumer-grade RGB-D cameras installed on a mobile robot. Our approach first performs instance segmentation on each image in a sequence. Then, by exploiting geometric information and depth maps, we track the same instances throughout the sequence. We produce a 3D point cloud containing instances, exploiting odometry information and 3D semantic mapping. Once our robot performs a new recording at a different plant growth stage, it associates each fruit with the previously built 3D cloud and update the model. We validate the system in a real-world glasshouse environment in Bonn, Germany. Experimental results demonstrate that our system outperforms existing baselines even though it relies only on annotated images and operates at frame-rate, allowing the deployment on a real robot.

Index terms

Robotics and Automation in Agriculture and Forestry Mapping