← Back ICRA 2026

BiGraspFormer: End-To-End Bimanual Grasp Transformer

Kangmin Kim, Seunghyeok Back, Geonhyup Lee, Sangbeom Lee, Sangjun Noh, Kyoobin Lee

PDF

AI summary

Key figure (auto-extracted from paper)

BiGraspFormer achieves state-of-the-art bimanual grasp success and diversity while maintaining fast inference speeds under 0.05 seconds.

Bimanual grasping end-to-end learning transformer robot manipulation grasp generation dual-arm coordination

Problem

Existing bimanual grasping methods rely on modular pipelines that separate grasp generation and evaluation, leading to coordination issues like collisions, unbalanced forces, and high computational complexity.

Approach

The authors propose BiGraspFormer, a unified end-to-end transformer that uses a Single-Guided Bimanual strategy to generate diverse single-grasp candidates and condition bimanual pose and quality predictions directly on those features.

Key results

89.67% top-1% success rate under normal forces
59.72% success rate under external disturbance conditions
Superior grasp diversity across all tested object geometries
Sub-0.05 second inference time for real-time deployment

Why it matters

Advances practical dual-arm robotics by enabling stable, real-time manipulation of large and complex objects without modular pipeline overhead.

Abstract

Bimanual grasping is essential for robots to handle large and complex objects. However, existing methods either focus solely on single-arm grasping or employ separate grasp generation and bimanual evaluation stages, leading to coor- dination problems including collision risks and unbalanced force distribution. To address these limitations, we propose BiGraspFormer, a unified end-to-end transformer framework that directly generates coordinated bimanual grasps from object point clouds. Our key idea is the Single-Guided Bimanual (SGB) strategy, which first generates diverse single grasp candidates using a transformer decoder, then leverages their learned fea- tures through specialized attention mechanisms to jointly pre- dict bimanual poses and quality scores. This conditioning strat- egy reduces the complexity of the 12-DoF search space while ensuring coordinated bimanual manipulation. Comprehensive simulation experiments and real-world validation demonstrate that BiGraspFormer consistently outperforms existing methods while maintaining efficient inference speed (<0.05s), confirming the effectiveness of our framework. Code and supplementary materials are available at https://sites.google.com/bigraspformer

Index terms

Bimanual Manipulation Grasping Perception for Grasping and Manipulation