← Back ICRA 2023

Joint Segmentation and Grasp Pose Detection with Multi-Modal Feature Fusion Network

Xiaozheng Liu, Yunzhou Zhang, He Cao, Shan Dexing, Jiaqi Zhao

PDF

Abstract

Efficient grasp pose detection is essential for robotic manipulation in cluttered scenes. However, most meth- ods only utilize point clouds or images for prediction, ignoring the advantages of different features. In this paper, we present a multi-modal fusion network for joint segmentation and grasp pose detection. We design a point cloud and image co-guided feature fusion module that can be used to fuse features and adaptively estimate the importance of the point-pixel feature pairs. Moreover, we develop a seed point sampling algorithm that simultaneously considers the distance, semantics and at- tention scores. For selected seed points, we adopt a local feature aggregation module to fully utilize the local spatial features in the grasp region. Experimental results on the GraspNet-1Billion Dataset show that our network outperforms several state-of-the- art methods. We also conduct real robot grasping experiments to demonstrate the effectiveness of our approach.

Index terms

Perception for Grasping and Manipulation Deep Learning in Grasping and Manipulation Object Detection Segmentation and Categorization