SM^3: Self-Supervised Multi-Task Modeling with Multi-View 2D Images for Articulated Objects
Haowen WANG, Zhen Zhao, Zhao Jin, Zhengping Che, Liang Qiao, Huang Yakun, Zhipeng Fan, Qiao XiuQuan, Jian Tang
Abstract
Reconstructing real-world objects and estimating their movable joint structures are pivotal technologies within the field of robotics. Previous research has predominantly focused on supervised approaches, relying on annotated datasets to model articulated objects within limited categories. However, these approaches fall short of effectively addressing the diversity present in the real world. To tackle this issue, we propose a self-supervised interaction perception method, referred to as SM3, which leverages multi-view RGB images captured before and after interaction to model articulated objects, identify the movable parts, and infer the parameters of their rotating joints. By constructing 3D geometries and textures from the captured 2D images, SM3 achieves integrated optimization of movable part and joint parameters during the reconstruction process, obviating the need for annotations. Furthermore, we introduce the MMArt dataset, an extension of PartNet-Mobility, encom- passing multi-view and multi-modal data of articulated objects spanning diverse categories. Evaluations demonstrate that SM3 surpasses existing benchmarks across various categories and objects, and its adaptability in real-world scenarios has been thoroughly validated.