← Back ICRA 2026

GraspGen: A Diffusion-Based Framework for 6-DOF Grasping with On-Generator Training

Adithyavairavan Murali, Balakumar Sundaralingam, Yu-Wei Chao, Wentao Yuan, Jun Yamada, Mark Carlson, Fabio Ramos, Stan Birchfield, Dieter Fox, Clemens Eppner

PDF

AI summary

Key figure (auto-extracted from paper)

GraspGen achieves state-of-the-art 6-DOF grasping accuracy and generalization across diverse grippers, cluttered scenes, and real-world conditions by combining a diffusion-based generator with a discriminator trained on its own generated samples.

6-DOF grasping diffusion models grasp generation on-generator training robotic manipulation discriminative scoring

Problem

Learning-based 6-DOF grasping systems struggle to generalize across different robot embodiments and in-the-wild settings, often failing to act as turnkey solutions due to distribution shifts between offline training data and real-time generated grasps.

Approach

The framework employs a Diffusion-Transformer to iteratively generate grasp poses, paired with an efficient discriminator that is trained on an On-Generator dataset of samples produced directly by the diffusion model to accurately score and filter grasps.

Key results

Outperforms baselines by over 48% AUC on single-object grasp generation
Achieves state-of-the-art success rates on the FetchBench cluttered grasping benchmark
Demonstrates robustness to partial point clouds, diverse gripper morphologies, and real-world noise
Releases a 53-million-grasp simulated dataset and fully open-sources the framework

Why it matters

It delivers a scalable, reliable, and modular grasping pipeline that significantly advances the deployment of general-purpose robotic manipulation in complex, real-world environments.

Abstract

Grasping is a fundamental robot skill, yet de- spite significant research advancements, learning-based 6-DOF grasping approaches are still not turnkey and struggle to generalize across different embodiments and in-the-wild settings. We build upon the recent success on modeling the object- centric grasp generation process as an iterative diffusion process. Our proposed framework, GraspGen, consists of a Diffusion- Transformer architecture that enhances grasp generation, paired with an efficient discriminator to score and filter sampled grasps. We introduce a novel and performant on-generator training recipe for the discriminator. To scale GraspGen to both objects and grippers, we release a new simulated dataset consisting of over 53 million grasps. We demonstrate that GraspGen outperforms prior methods in simulations with singulated objects across different grippers, achieves state-of-the-art performance on the FetchBench benchmark for grasping in clutter, and performs well on a real robot with noisy visual observations.

Index terms

Grasping Deep Learning in Grasping and Manipulation Perception for Grasping and Manipulation