← Back ICRA 2026

GraspClutter6D: A Large-Scale Real-World Dataset for Robust Perception and Grasping in Cluttered Scenes

Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, Kyoobin Lee

PDF

AI summary

Key figure (auto-extracted from paper)

Training on this large-scale, highly cluttered real-world dataset significantly improves robotic grasping and perception performance compared to existing benchmarks.

robotic grasping cluttered scenes 6D pose estimation real-world dataset deep learning manipulation

Problem

Existing robotic grasping datasets focus on simplified, low-occlusion scenes, failing to capture the complexity of real-world cluttered environments and limiting the robustness of learned manipulation systems.

Approach

The authors collected 1,000 highly cluttered real-world scenes across bins, shelves, and tables using multi-camera setups, generating 9.3 billion 6-DoF grasp annotations and 736K object poses, then benchmarked and trained state-of-the-art networks on this data.

Key results

1,000 densely cluttered real-world scenes with 62.6% average occlusion
9.3 billion collision-free 6-DoF grasp annotations and 736K 6D object poses
Significantly improved grasping success rates in simulation and real-world tests
New performance baselines for segmentation, pose estimation, and grasp detection

Why it matters

It provides the robotics community with a critical, large-scale real-world benchmark to develop and validate robust manipulation systems for practical, cluttered applications.

Abstract

Robust grasping in cluttered environments re- mains an open challenge in robotics. While benchmark datasets have significantly advanced deep learning methods, they mainly focus on simplistic scenes with light occlusion and insufficient diversity, limiting their applicability to practical scenarios. We present GraspClutter6D, a large-scale real-world grasp- ing dataset featuring: (1) 1,000 highly cluttered scenes with dense arrangements (14.1 objects/scene, 62.6% occlusion), (2) comprehensive coverage across 200 objects in 75 environment configurations (bins, shelves, and tables) captured using four RGB-D cameras from multiple viewpoints, and (3) rich an- notations including 736K 6D object poses and 9.3B feasible robotic grasps for 52K RGB-D images. We benchmark state- of-the-art segmentation, object pose estimation, and grasp detection methods to provide key insights into challenges in cluttered environments. Additionally, we validate the dataset’s effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments. The dataset, toolkit, and annotation tools are publicly available on our project website: https: //sites.google.com/view/graspclutter6d.

Index terms

Data Sets for Robotic Vision Data Sets for Robot Learning Deep Learning in Grasping and Manipulation