SPILL: Size, Pose, and Internal Liquid Level Estimation of Transparent Glassware for Robotic Bartending
Louis Adriaens, Thomas Lips, Mathieu De Coster, Andreas Verleysen, Francis wyffels
AI summary
Problem
Robotic manipulation of transparent objects is hindered by refraction, lack of texture, and unreliable RGB-D depth data, making it difficult to localize and interact with unknown glassware in unstructured environments.
Approach
SPILL combines object detection with semantic keypoint detection on a single RGB-D image, lifting 2D keypoints to 3D via a support plane and tapered-cylinder model to estimate geometry without depth completion or object-specific models.
Key results
- 93.6% success rate across 500 autonomous pours with 20 unseen glasses
- Real-time inference at ~40 FPS on standard CPU/GPU hardware
- 98.3% success rate in live public event demonstrations
- Introduction of Glasses-in-the-Wild dataset for robust keypoint training
Why it matters
Enables scalable, real-world transparent object interaction for service and assistive robotics without requiring specialized sensors or pre-trained object models.
Abstract
Robotic perception of transparent objects presents unique challenges due to their refractive properties, lack of texture, and limitations of conventional RGB-D sensors in cap- turing reliable depth information. These challenges significantly hinder robotic manipulation capabilities in real-world settings such as household assistance, hospitality, and healthcare. To address these issues, we propose SPILL: A lightweight perception pipeline for Size, Pose, and Internal Liquid Level estimation of unknown transparent glassware using a single view. SPILL combines object detection with semantic keypoint detection, and operates without requiring object-specific 3D models or depth completion. We demonstrate its effectiveness in autonomous robotic pouring tasks. Additionally, to enhance the robustness and generalization of keypoint detection to diverse real-world scenarios, we introduce Glasses-in-the-Wild, a new dataset that captures a wide variety of glass types in realistic environments. Evaluated on a robot manipulator, SPILL achieves a 93.6% success rate across 500 autonomous pours with 20 unseen glasses in three diverse real-world scenes. We further demonstrate robustness through multiple live public events in real-world, human-centered environments. In one recorded session, the robot autonomously served 62 drinks with a 98.3% success rate. These results demonstrate that task-relevant keypoint detection enables scalable, real-world transparent object interaction, paving the way for practical applications in service and assistive robotics - without spilling a drop. Dataset and code are available at https://github.com/Louadria/SPILL.