← Back ICRA 2023

GSNet: Model Reconstruction Network for Category-Level 6D Object Pose and Size Estimation

Penglei Liu, Qieshi Zhang, Jun Cheng

PDF

Abstract

Category-level 6D pose and size estimation is to estimate the rotation, translation and size of the observed instance objects from an arbitrary angle in a cluttered scene. Compared with instance-level 6D pose estimation, there are two main challenges for category-level 6D pose estimation. One is that the algorithm needs to estimate the 6D pose and size of unseen objects, and no 3D models are available. Another is that different instance objects of the same class of objects differ greatly in shape. This paper propose a novel method to estimate the 6D pose and size of unseen objects from an RGB- D image. To handle intra-class shape variation, we propose an autoencoder-decoder that is trained on a set of object models to learn structural feature-invariant and shape-variant features of intra-class objects, and constructs a category-level priori model containing the structure feature and shape feature. To solve the problem of 3D model, this paper proposes a model reconstruction network including 3D graph convolution and spherical convolution (GSNet), which can reconstruct the 3D model of the observed instance object from the input RGB-D image and the priori model, and establish a dense correspon- dence between the 3D model and the observed instance object. Finally, random sample consensus (RANSAC) algorithm and Umeyama algorithm are used to estimate the 6D pose and size of the object. Extensive experiments on benchmark datasets show that the proposed method achieves state-of-the-art performance in category-level 6D object pose estimation. In order to prove that our method can be applied to the grasping and operation tasks of robots in industry and life, we deploy our method to a physical UR5 robot to perform grasping tasks on unseen but category known instances, and the results validate the efficacy of our proposed method.

Index terms

Deep Learning in Grasping and Manipulation Perception for Grasping and Manipulation Deep Learning for Visual Perception