HOGraspFlow: Taxonomy-Aware Hand�Object Retargeting for Multi-Modal SE(3) Grasp Generation
Yitian Shi, Zicheng Guo, Rosa Petra Wolf, Edgar Welte, Rania Rayyes
AI summary
Problem
Current grasp retargeting methods either collapse human grasp diversity by relying on simplified pinch templates or require reliable 3D object geometry and pose estimation, preventing direct use of in-the-wild video demonstrations.
Approach
The method conditions a denoising flow matching model on RGB foundation features, reconstructed hand contact maps, and grasp taxonomy priors to synthesize multi-modal SE(3) parallel-jaw grasps directly from a single RGB crop without explicit object models.
Key results
- High-fidelity grasp synthesis without explicit object geometry or contact input
- Consistent outperformance of diffusion-based variants in distributional fidelity and optimization stability
- Strong contact localization and grasp taxonomy recognition accuracy
- Over 83% grasp success rate in real-world experiments on unseen objects
Why it matters
Provides a scalable, vision-only pathway for transferring diverse human manipulation demonstrations to parallel-jaw robotic grippers in unstructured environments.
Abstract
We propose Hand-Object(HO)GraspFlow, an affordance-centric approach that retargets a single RGB with hand-object interaction (HOI) into multi-modal executable parallel jaw grasps without explicit geometric priors on target objects. Building on foundation models for hand reconstruction and vision, we synthesize SE(3) grasp poses with denoising flow matching (FM), conditioned on the following three complemen- tary cues: RGB foundation features as visual semantics, HOI contact reconstruction, and taxonomy-aware prior on grasp types. Our approach demonstrates high fidelity in grasp synthe- sis without explicit HOI contact input or object geometry, while maintaining strong contact and taxonomy recognition. Another controlled comparison shows that HOGraspFlow consistently outperforms diffusion-based variants (HOGraspDiff), achieving high distributional fidelity and more stable optimization in SE(3). We demonstrate a reliable, object-agnostic grasp syn- thesis from human demonstrations in real-world experiments, where an average success rate of over 83% is achieved. Code: https://github.com/YitianShi/HOGraspFlow