Tool-Grasp: A 6-DoF Functional Grasping Framework for General-Purpose Hand Tools
Hongliang Lei, Jian Huang, Andong Li, Haoyuan Wang, Chen Liu, Wei Luo, Jiuyao Xiang
AI summary
Problem
Existing robotic grasping methods lack task-valid 6-DoF grasp detection for tools due to scarce real-world datasets with functional labels, difficulty in fine-grained functional segmentation, and poor alignment between grasp poses and functional regions.
Approach
The method first segments precise functional grasp regions using a mask-guided network, then fuses RGB-D, point cloud, and pose features to predict stable 6-DoF grasp poses constrained to those regions.
Key results
- Tool-Grasp dataset: 20 tool categories, 50 scenes, 12,600 RGB-D images, 250M+ 6-DoF annotations
- MG-GRSN improves functional region segmentation mIoU by 3.5% (seen) and 5.2% (unseen)
- QAM-GPDN boosts functional grasp pose accuracy by 2.89% (seen) and 3.76% (unseen)
- Real-robot experiments validate real-world tool manipulation effectiveness
Why it matters
Advances practical robotic manipulation by enabling reliable tool handling in unstructured environments for industrial and household applications.
Abstract
Detecting functional grasp poses for tool operation is critical for robots in complex real-world tasks, yet existing methods lack this capability. Key challenges are: 1) Scarce real- world datasets with fine-grained functional labels and task-valid grasp annotations, as their construction requires domain knowl- edge (making annotation labor-intensive/subjective) and linking poses to tool usage (beyond stability checks); 2) Difficulty in fine-grained functional segmentation, where minimal sub-region differences are overwhelmed by global cues/noise, with 3D model-dependent methods impractical in unstructured settings; 3) Poor 6-DoF grasp alignment with functional regions due to high morphological heterogeneity, as existing methods either fail to balance stability and functional constraints (high-score grasps outside regions) or are limited to low degrees of freedom. To address these, we build the Tool-Grasp Dataset (20 tool categories, 50 scenes, 12,600 RGB-D images, 250M+ 6-DoF an- notations) with fine-grained functional labels. We propose Tool- Grasp, a two-stage 6-DoF framework: Stage 1’s Mask-Guided Grasp Region Segmentation Network (MG-GRSN) leverages tool-specific semantics to output precise functional masks, mitigating intra-tool variability; Stage 2’s Quality-Aware Multi- Modal Grasp Pose Detection Network (QAM-GPDN) uses these masks to constrain predictions, fusing RGB-D features with a quality module to select aligned poses. Experiments show MG- GRSN outperforms baselines by 3.5% (seen) and 5.2% (unseen) in mIoU; QAM-GPDN boosts functional pose AP by 2.89% (seen) and 3.76% (unseen). Real-robot experiments validate real-world effectiveness.