TORM: Transparent Objects Reconstruction and Manipulation with Multi-View Segmentation
Qiyuan Qiao, Fuling Lin, Huibin Zhao, Bowen Xu, Zhiqiang Chen, Dong Xu, Peng Lu
AI summary
Problem
Accurately reconstructing and grasping multiple transparent objects is hindered by complex light refraction, lack of distinct visual features, and the tendency of existing methods to merge objects or get trapped in suboptimal geometric solutions.
Approach
The method extracts multi-view semantic silhouettes to constrain a self-supervised deep marching tetrahedra network, then applies a progressive loss and connectivity-based mesh separation to reconstruct individual objects and predict grasp poses in parallel.
Key results
- Simultaneous multi-object reconstruction via DMTet-Multi
- Progressive envelope loss prevents local minima trapping
- Connectivity-based mesh separation for parallel grasp planning
- 88.8% real-world grasping success rate
Why it matters
Enables reliable RGB-only perception and manipulation of transparent items, advancing robotic automation for laboratory, household, and AR applications.
Abstract
Transparent objects are common in daily life and industry, necessitating that robots be able to perceive and manipulate them. The physical properties of reflection and refraction pose challenges for accurately reconstructing the 3D geometry of transparent objects. Conventional methods, which rely on simultaneous estimation of background ambient light and complex refraction fields, lack robustness in real-world scenes, thereby impeding robotic grasping performance. To address this issue, this paper proposes TORM, a novel framework for robust reconstruction and manipulation of multiple transparent objects. TORM focuses on semantic information from transparent objects and employs multi-view segmentation masks to constrain a self-supervised multi-object deep marching tetrahedra (DMTet- Multi) 3D fitting process. To mitigate the risk of the geometry representation getting stuck in suboptimal solutions during multi- transparent-object reconstruction, we design a novel loss function that prevents marching tetrahedra from crossing boundaries. By applying a connectivity determination strategy to the fitted mesh, transparent objects can be processed in parallel by a grasp perception network, predicting the end-effector configuration for grasp tasks. Real-world experiments demonstrate that TORM achieves an 88.8% grasping success rate in multi-transparent- object grasping tasks.