Research Analyzer
← Back ICRA 2026

DextrAH-RGB: Visuomotor Policies to Grasp Anything with Dexterous Hands

Ritvik Singh, Arthur Allshire, Ankur Handa, Nathan Ratliff, Karl Van Wyk

PDF

AI summary

Key figure (auto-extracted from paper)
First robust sim-to-real transfer of an end-to-end stereo RGB policy for complex dexterous grasping without depth sensors or CAD models.
dexterous grasping end-to-end vision sim-to-real transfer stereo RGB reinforcement learning geometric fabrics

Problem

Dexterous grasping with multi-fingered hands is challenging due to reliance on depth sensing, CAD models, or static planning, which limits robustness and generalization in unstructured real-world environments.

Approach

Trains a state-based teacher policy via reinforcement learning in simulation, then distills it into a stereo RGB-based student policy using cross-attention transformers and online DAgger, acting through a geometric fabric controller.

Key results

  • Robust sim-to-real transfer of end-to-end RGB dexterous grasping
  • Stereo vision architecture reduces positional error and outperforms monocular baselines
  • Successful grasping of unseen objects across diverse textures and HDR lighting
  • Eliminates dependency on depth sensors, CAD models, and controlled lighting

Why it matters

Enables practical, deployable dexterous robots for real-world manipulation tasks in unstructured environments where depth sensors fail.

Abstract

One of the most important, yet challenging, skills for a dexterous robot is grasping a diverse range of objects. Much of the prior work has been limited by speed, generality, or reliance on depth maps and object poses. In this paper, we introduce DextrAH-RGB, a system that can perform dexterous arm-hand grasping end-to-end from RGB image input. We train a policy in simulation through reinforcement learning that acts on a geometric fabric controller to dexterously grasp a wide variety of objects. We then distill this into an RGB-based policy strictly in simulation using photorealistic tiled rendering. To our knowledge, this is the first work that is able to demonstrate robust sim-to-real transfer of an end-to-end (monocular or stereo) RGB-based policy for complex, dynamic, contact-rich tasks such as dexterous grasping with multi-fingered hands. Unlike previous methods, DextrAH-RGB requires no explicit 1NVIDIA, Santa Clara, CA, USA. 2University of California, Berkeley, Berkeley, CA, USA. depth or CAD models, making it significantly more practical and robust in varied real-world lighting and texture conditions. It generalizes to novel objects and scenes, offering a strong step toward deployable, vision-based dexterous manipulation.

Index terms

Perception for Grasping and Manipulation Multifingered Hands Grippers and Other End-Effectors

Related papers