DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Densely Cluttered Environments
Lixin Xu, Zixuan Liu, ZHEWEI GUI, Jingxiang Guo, Zeyu Jiang, Tongzhou Zhang, Zhixuan Xu, Chongkai Gao, Lin Shao
AI summary
Problem
Grasping in densely cluttered environments is challenging because existing methods lack explicit singulation training and struggle when objects tightly occlude the target, leading to low success rates and inefficient manipulation.
Approach
The authors propose DexSinGrasp, a unified reinforcement learning policy that jointly optimizes finger-driven singulation and grasping, trained via progressive clutter complexity and distilled into a vision-based policy for real-world deployment.
Key results
- Outperforms baselines in success rate and efficiency within dense clutter
- Successfully generalizes across diverse object arrangements and occlusion levels
- Demonstrates superior efficiency by leveraging finger dexterity over palm-driven approaches
- Enables real-world deployment via vision-based policy distillation
Why it matters
Advances robotic manipulation in complex, real-world scenarios by providing a scalable, efficient method for dexterous hands to handle tightly packed objects, benefiting manufacturing, logistics, and assembly automation.
Abstract
Grasping objects in cluttered environments remains a fundamental yet challenging problem in robotic manipula- tion. While prior works have explored learning-based synergies between pushing and grasping for two-fingered grippers, few have leveraged the high degrees of freedom (DoF) in dexterous hands to perform efficient singulation for grasping in cluttered settings. In this work, we introduce DexSinGrasp, a unified policy for dexterous object singulation and grasping. DexSinGrasp enables high-dexterity object singulation to facilitate grasping, significantly improving efficiency and effectiveness in cluttered environments. We incorporate clutter arrangement curriculum learning to enhance success rates and generalization across di- verse clutter conditions, while policy distillation enables a deploy- able vision-based grasping strategy. To evaluate our approach, we introduce a set of cluttered grasping tasks with varying object arrangements and occlusion levels. Experimental results show that our method outperforms baselines in both efficiency and grasping success rate, particularly in dense clutter. Codes, appendix, and videos are available on our website https://nus- lins-lab.github.io/dexsingweb/.