← Back ICRA 2026

Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration

Xingyi He, Adhitya Polavaram, Yunhao Cao, Om Deshmukh, Tianrui Wang, Xiaowei Zhou, Kuan Fang

PDF

AI summary

Key figure (auto-extracted from paper)

CorDex enables robots to learn robust, functional dexterous grasps for novel objects from just one human demonstration video by combining a correspondence-based data synthesis engine with a multimodal grasp prediction network.

Dexterous grasping functional grasping learning from demonstration data synthesis multimodal learning robot manipulation

Problem

Learning functional dexterous grasping is hindered by the scarcity of large-scale, high-quality grasp datasets and the lack of integrated semantic and geometric reasoning in existing models, making it difficult to generalize to unseen objects.

Approach

The framework uses a three-stage data engine to generate diverse training grasps from a single human video via 2D-3D correspondence transfer and physics-informed optimization, paired with a multimodal network that fuses RGB and geometric features to predict grasps for novel objects.

Key results

Generates 11 million grasp-image pairs for 900 objects across 9 categories from a single demo
Achieves 69% success rate on unseen real-world objects
Outperforms state-of-the-art baselines in simulation and real-world experiments
Enables robust category-level generalization to novel objects with large shape variations

Why it matters

It provides a scalable, low-data pathway for robots to master complex tool-use and manipulation tasks, significantly advancing practical dexterous manipulation.

Abstract

Functional grasping with dexterous robotic hands is a key capability for enabling tool use and complex manip- ulation, yet progress has been constrained by two persistent bottlenecks: the scarcity of large-scale datasets and the absence of integrated semantic and geometric reasoning in learned models. In this work, we present CorDex, a framework that robustly learns dexterous functional grasps of novel objects from synthetic data generated from just a single human demon- stration. At the core of our approach is a correspondence-based data engine that generates diverse, high-quality training data in simulation. Based on the human demonstration, our data engine generates diverse object instances of the same category, transfers the expert grasp to the generated objects through correspondence estimation, and adapts the grasp through optimization. Building on the generated data, we introduce a multimodal prediction network that integrates visual and geometric information. By devising a local–global fusion module and an importance-aware sampling mechanism, we enable robust and computationally efficient prediction of functional dexterous grasps. Through extensive experiments across various object categories, we demonstrate that CorDex generalizes well to unseen object instances and significantly outperforms state- of-the-art baselines. For additional results and videos, please visit https://cordex-manipulation.github.io.

Index terms

Dexterous Manipulation Deep Learning in Grasping and Manipulation