ClearDepth: Efficient Stereo Perception of Transparent Objects for Robotic Manipulation
Kaixin Bai, Huajian Zeng, Lei Zhang, Yiwen Liu, Hongli Xu, Zhaopeng Chen, Jianwei Zhang
AI summary
Problem
Standard stereo sensors and matching algorithms fail on transparent objects due to light refraction and reflection, producing unreliable depth maps that hinder robotic manipulation.
Approach
The method uses a cascaded vision transformer to extract structural cues and a lightweight GRU-based post-fusion module to combine them with appearance features, trained on a physically realistic synthetic dataset to bridge the simulation-to-reality gap.
Key results
- Outperforms state-of-the-art stereo matching methods in disparity accuracy on transparent objects
- Increases real-world transparent object grasp success rate by at least 18%
- Introduces SynClearDepth, a photo-realistic dataset with 14,091 stereo pairs and precise depth labels
- Demonstrates strong Sim2Real generalization for cluttered indoor robotic manipulation
Why it matters
Provides a scalable, accurate perception pipeline that enables service and logistics robots to reliably handle transparent items in real-world environments.
Abstract
Transparent object depth perception remains a major challenge in robotics and logistics due to the limitations of standard 3D sensors in capturing accurate depth on transparent and reflective surfaces. This affects applications relying on depth maps and point clouds, particularly in robotic manipulation. To address this, we propose ClearDepth, a vision transformer-based algorithm for stereo depth recovery of transparent objects, enhanced by a novel feature post-fusion module that refines depth estimation using structural visual features. To mitigate the high costs of stereo dataset collection, we introduce a physically realistic, domain- adaptive Sim2Real framework for efficient data generation. Our method outperforms state-of-the-art stereo matching approaches on transparent depth recovery. Furthermore, in transparent object grasping experiments, ClearDepth improves transparent-scene perception and achieves at least an 18% higher grasp success rate compared to the state-of-the-art methods for transparent object manipulation. Our method demonstrates strong Sim2Real gener- alization, enabling precise depth perception of transparent objects for robotic applications in the real world. Dataset and project details are available at https://sites.google.com/view/cleardepth/.