GraspGen: A Diffusion-Based Framework for 6-DOF Grasping with On-Generator Training
Adithyavairavan Murali, Balakumar Sundaralingam, Yu-Wei Chao, Wentao Yuan, Jun Yamada, Mark Carlson, Fabio Ramos, Stan Birchfield, Dieter Fox, Clemens Eppner
AI summary
Problem
Learning-based 6-DOF grasping systems struggle to generalize across different robot embodiments and in-the-wild settings, often failing to act as turnkey solutions due to distribution shifts between offline training data and real-time generated grasps.
Approach
The framework employs a Diffusion-Transformer to iteratively generate grasp poses, paired with an efficient discriminator that is trained on an On-Generator dataset of samples produced directly by the diffusion model to accurately score and filter grasps.
Key results
- Outperforms baselines by over 48% AUC on single-object grasp generation
- Achieves state-of-the-art success rates on the FetchBench cluttered grasping benchmark
- Demonstrates robustness to partial point clouds, diverse gripper morphologies, and real-world noise
- Releases a 53-million-grasp simulated dataset and fully open-sources the framework
Why it matters
It delivers a scalable, reliable, and modular grasping pipeline that significantly advances the deployment of general-purpose robotic manipulation in complex, real-world environments.
Abstract
Grasping is a fundamental robot skill, yet de- spite significant research advancements, learning-based 6-DOF grasping approaches are still not turnkey and struggle to generalize across different embodiments and in-the-wild settings. We build upon the recent success on modeling the object- centric grasp generation process as an iterative diffusion process. Our proposed framework, GraspGen, consists of a Diffusion- Transformer architecture that enhances grasp generation, paired with an efficient discriminator to score and filter sampled grasps. We introduce a novel and performant on-generator training recipe for the discriminator. To scale GraspGen to both objects and grippers, we release a new simulated dataset consisting of over 53 million grasps. We demonstrate that GraspGen outperforms prior methods in simulations with singulated objects across different grippers, achieves state-of-the-art performance on the FetchBench benchmark for grasping in clutter, and performs well on a real robot with noisy visual observations.