← Back ICRA 2026

T(R, O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping

Xin Fei, Zhixuan Xu, Huaicong Fang, Tianrui Zhang, Lin Shao

PDF

AI summary

Key figure (auto-extracted from paper)

T(R,O) Grasp leverages a memory-efficient graph diffusion model to synthesize accurate, cross-embodiment dexterous grasps at unprecedented speed and success rates.

Dexterous Grasping Graph Diffusion Cross-Embodiment Spatial Transformation Generative Robotics Efficient Inference

Problem

Current dexterous grasp synthesis methods face a trade-off between computational efficiency, cross-embodiment generalization, and robustness to initialization, often suffering from high memory overhead or brittle performance under partial observations.

Approach

The authors propose the T(R,O) Graph, a unified representation encoding spatial transformations between object patches and robotic hand links, and train a transformer-based graph diffusion model to efficiently generate grasps without relying on feasible initial poses.

Key results

94.83% average success rate across diverse dexterous hands
0.21s inference time and 41 grasps/second throughput on a single A100 GPU
68% reduction in GPU memory usage compared to prior interaction-centric methods
Enables reliable closed-loop manipulation through initialization-independent diffusion sampling

Why it matters

It establishes a scalable, real-time capable foundation for cross-embodiment dexterous manipulation, bridging the gap between high-fidelity grasp generation and practical robotic deployment.

Abstract

Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T (R, O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T (R, O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T (R, O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. More importantly, the high inference speed enables closed-loop dexterous manipulation, underscoring the potential of T (R, O) Grasp to scale into a foundation model * denotes equal contribution † denotes corresponding author This research is supported by the Ministry of Education, Singapore, under the Academic Research Fund Tier 1 (FY2024) for dexterous grasping. The code, appendix, and videos are available at https://nus-lins-lab.github.io/trograspweb/ .

Index terms

Grasping Dexterous Manipulation Deep Learning in Grasping and Manipulation