← Back ICRA 2026

Tool-Grasp: A 6-DoF Functional Grasping Framework for General-Purpose Hand Tools

Hongliang Lei, Jian Huang, Andong Li, Haoyuan Wang, Chen Liu, Wei Luo, Jiuyao Xiang

PDF

AI summary

Key figure (auto-extracted from paper)

A novel two-stage framework and dataset enable robots to precisely detect and execute stable 6-DoF grasps aligned with the functional regions of general-purpose hand tools.

Functional grasping 6-DoF grasp detection tool manipulation RGB-D segmentation robotic manipulation multi-modal fusion

Problem

Existing robotic grasping methods lack task-valid 6-DoF grasp detection for tools due to scarce real-world datasets with functional labels, difficulty in fine-grained functional segmentation, and poor alignment between grasp poses and functional regions.

Approach

The method first segments precise functional grasp regions using a mask-guided network, then fuses RGB-D, point cloud, and pose features to predict stable 6-DoF grasp poses constrained to those regions.

Key results

Tool-Grasp dataset: 20 tool categories, 50 scenes, 12,600 RGB-D images, 250M+ 6-DoF annotations
MG-GRSN improves functional region segmentation mIoU by 3.5% (seen) and 5.2% (unseen)
QAM-GPDN boosts functional grasp pose accuracy by 2.89% (seen) and 3.76% (unseen)
Real-robot experiments validate real-world tool manipulation effectiveness

Why it matters

Advances practical robotic manipulation by enabling reliable tool handling in unstructured environments for industrial and household applications.

Abstract

Detecting functional grasp poses for tool operation is critical for robots in complex real-world tasks, yet existing methods lack this capability. Key challenges are: 1) Scarce real- world datasets with fine-grained functional labels and task-valid grasp annotations, as their construction requires domain knowl- edge (making annotation labor-intensive/subjective) and linking poses to tool usage (beyond stability checks); 2) Difficulty in fine-grained functional segmentation, where minimal sub-region differences are overwhelmed by global cues/noise, with 3D model-dependent methods impractical in unstructured settings; 3) Poor 6-DoF grasp alignment with functional regions due to high morphological heterogeneity, as existing methods either fail to balance stability and functional constraints (high-score grasps outside regions) or are limited to low degrees of freedom. To address these, we build the Tool-Grasp Dataset (20 tool categories, 50 scenes, 12,600 RGB-D images, 250M+ 6-DoF an- notations) with fine-grained functional labels. We propose Tool- Grasp, a two-stage 6-DoF framework: Stage 1’s Mask-Guided Grasp Region Segmentation Network (MG-GRSN) leverages tool-specific semantics to output precise functional masks, mitigating intra-tool variability; Stage 2’s Quality-Aware Multi- Modal Grasp Pose Detection Network (QAM-GPDN) uses these masks to constrain predictions, fusing RGB-D features with a quality module to select aligned poses. Experiments show MG- GRSN outperforms baselines by 3.5% (seen) and 5.2% (unseen) in mIoU; QAM-GPDN boosts functional pose AP by 2.89% (seen) and 3.76% (unseen). Real-robot experiments validate real-world effectiveness.

Index terms

Grasping RGB-D Perception Data Sets for Robot Learning