TransDexNet: An End-To-End Motion Retargeting Network with Transformer for Dexterous Hand Teleoperation from RGB Images
jiaying Tan, Qing Gao, yuanchuan lai
AI summary
Problem
Existing vision-based dexterous hand teleoperation methods rely on costly specialized hardware, multi-stage pipelines that cause latency and error accumulation, or depth sensors that limit real-world deployment.
Approach
The authors introduce TransDexNet, a dual-branch Transformer network that aligns latent features from human and robotic hand images to directly regress the dexterous hand's joint angles from a single RGB input.
Key results
- Achieves 0.076 rad average joint angle error
- Enables real-time inference at 0.22 seconds per frame
- Introduces TransDexData, a 91,000-sample paired RGB dataset
- Demonstrates accurate retargeting in simulation and real-world experiments
Why it matters
Enables low-cost, hardware-free, real-time dexterous hand teleoperation for applications in rehabilitation, manufacturing, and hazardous environment rescue.
Abstract
Dexterous hand teleoperation is becoming in- creasingly common, yet existing methods rarely provide both efficiency and convenience. The core challenge is to achieve motion retargeting from the human hand to a dexterous hand. To address this, we introduce TransDexNet, an end-to- end vision-based motion retargeting architecture for dexterous hands. Equipped with a Vision Transformer backbone, it takes a single RGB image of a human hand and directly regresses the joint angles of a dexterous hand without any intermediate pose estimation. The architecture employs dual branches bridged by an alignment layer to close the gaps in degrees of freedom (DoFs), geometry, and kinematics between the human and dexterous hands, enabling domain-invariant latent features. To train TransDexNet, we built a dataset named TransDexData, consisting of 91,000 RGB images of human hands paired with the corresponding dexterous hand RGB images and joint angles. In evaluation, the proposed network achieves an average joint angle error of 0.076 rad. Both simulation and real-world experiments demonstrate accurate and efficient performance. The project page is available at: https://joyyyy-gaint.github.io/TransDexNet.