← Back ICRA 2026

DextrAH-RGB: Visuomotor Policies to Grasp Anything with Dexterous Hands

Ritvik Singh, Arthur Allshire, Ankur Handa, Nathan Ratliff, Karl Van Wyk

PDF

AI summary

Key figure (auto-extracted from paper)

First robust sim-to-real transfer of an end-to-end stereo RGB policy for complex dexterous grasping without depth sensors or CAD models.

dexterous grasping end-to-end vision sim-to-real transfer stereo RGB reinforcement learning geometric fabrics

Problem

Dexterous grasping with multi-fingered hands is challenging due to reliance on depth sensing, CAD models, or static planning, which limits robustness and generalization in unstructured real-world environments.

Approach

Trains a state-based teacher policy via reinforcement learning in simulation, then distills it into a stereo RGB-based student policy using cross-attention transformers and online DAgger, acting through a geometric fabric controller.

Key results

Robust sim-to-real transfer of end-to-end RGB dexterous grasping
Stereo vision architecture reduces positional error and outperforms monocular baselines
Successful grasping of unseen objects across diverse textures and HDR lighting
Eliminates dependency on depth sensors, CAD models, and controlled lighting

Why it matters

Enables practical, deployable dexterous robots for real-world manipulation tasks in unstructured environments where depth sensors fail.

Abstract

One of the most important, yet challenging, skills for a dexterous robot is grasping a diverse range of objects. Much of the prior work has been limited by speed, generality, or reliance on depth maps and object poses. In this paper, we introduce DextrAH-RGB, a system that can perform dexterous arm-hand grasping end-to-end from RGB image input. We train a policy in simulation through reinforcement learning that acts on a geometric fabric controller to dexterously grasp a wide variety of objects. We then distill this into an RGB-based policy strictly in simulation using photorealistic tiled rendering. To our knowledge, this is the first work that is able to demonstrate robust sim-to-real transfer of an end-to-end (monocular or stereo) RGB-based policy for complex, dynamic, contact-rich tasks such as dexterous grasping with multi-fingered hands. Unlike previous methods, DextrAH-RGB requires no explicit 1NVIDIA, Santa Clara, CA, USA. 2University of California, Berkeley, Berkeley, CA, USA. depth or CAD models, making it significantly more practical and robust in varied real-world lighting and texture conditions. It generalizes to novel objects and scenes, offering a strong step toward deployable, vision-based dexterous manipulation.

Index terms

Perception for Grasping and Manipulation Multifingered Hands Grippers and Other End-Effectors