← Back ICRA 2026

SPILL: Size, Pose, and Internal Liquid Level Estimation of Transparent Glassware for Robotic Bartending

Louis Adriaens, Thomas Lips, Mathieu De Coster, Andreas Verleysen, Francis wyffels

PDF

AI summary

Key figure (auto-extracted from paper)

A lightweight single-view pipeline enables robots to accurately estimate the size, pose, and liquid level of unknown transparent glasses, achieving a 93.6% success rate in 500 autonomous pouring tasks.

transparent object perception robotic pouring semantic keypoints RGB-D vision service robotics depth estimation

Problem

Robotic manipulation of transparent objects is hindered by refraction, lack of texture, and unreliable RGB-D depth data, making it difficult to localize and interact with unknown glassware in unstructured environments.

Approach

SPILL combines object detection with semantic keypoint detection on a single RGB-D image, lifting 2D keypoints to 3D via a support plane and tapered-cylinder model to estimate geometry without depth completion or object-specific models.

Key results

93.6% success rate across 500 autonomous pours with 20 unseen glasses
Real-time inference at ~40 FPS on standard CPU/GPU hardware
98.3% success rate in live public event demonstrations
Introduction of Glasses-in-the-Wild dataset for robust keypoint training

Why it matters

Enables scalable, real-world transparent object interaction for service and assistive robotics without requiring specialized sensors or pre-trained object models.

Abstract

Robotic perception of transparent objects presents unique challenges due to their refractive properties, lack of texture, and limitations of conventional RGB-D sensors in cap- turing reliable depth information. These challenges significantly hinder robotic manipulation capabilities in real-world settings such as household assistance, hospitality, and healthcare. To address these issues, we propose SPILL: A lightweight perception pipeline for Size, Pose, and Internal Liquid Level estimation of unknown transparent glassware using a single view. SPILL combines object detection with semantic keypoint detection, and operates without requiring object-specific 3D models or depth completion. We demonstrate its effectiveness in autonomous robotic pouring tasks. Additionally, to enhance the robustness and generalization of keypoint detection to diverse real-world scenarios, we introduce Glasses-in-the-Wild, a new dataset that captures a wide variety of glass types in realistic environments. Evaluated on a robot manipulator, SPILL achieves a 93.6% success rate across 500 autonomous pours with 20 unseen glasses in three diverse real-world scenes. We further demonstrate robustness through multiple live public events in real-world, human-centered environments. In one recorded session, the robot autonomously served 62 drinks with a 98.3% success rate. These results demonstrate that task-relevant keypoint detection enables scalable, real-world transparent object interaction, paving the way for practical applications in service and assistive robotics - without spilling a drop. Dataset and code are available at https://github.com/Louadria/SPILL.

Index terms

Perception for Grasping and Manipulation Object Detection Segmentation and Categorization Service Robotics