← Back ICRA 2024

CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction

Guwen Han, Qi Ye, Anjun Chen, Jiming Chen

PDF

Abstract

Interactive hand mesh reconstruction from single- view images poses a significant challenge with the severe occlu- sion and depth ambiguity inherent in interactive hand gestures. Recent approaches that employ probabilistic models and token- pruned techniques have shown decent results in multi-view human body reconstruction. Nevertheless, these methods have not fully utilized multi-scale semantic information from multi- view images and are not applicable in scenarios involving severe occlusion during dual-hand interactions. Simultaneously, current single-view methods independently reconstruct the left and right hands, which are ineffective in enhancing the interaction between both hands. To address these challenges, we propose CAMInterHand, a cooperative attention-based method for multi-view interactive hand pose and mesh reconstruction. Specifically, CAMInterHand extracts local pyramid features and global vertex features from multi-scale feature maps of multi-view images, enabling the exploration of rich local se- mantic information and facilitating effective feature alignment. Furthermore, CAMInterHand employs the cooperative atten- tion fusion module to fuse all features from multi-view images, enhancing interactions among vertices of dual hands within global and local contexts. We conduct extensive experiments on the large-scale multi-view dataset InterHand2.6M and CAM- InterHand achieves a substantial performance improvement over existing methods for multi-view and single-view interactive hand reconstruction.

Index terms

Gesture Posture and Facial Expressions Visual Learning Deep Learning for Visual Perception