← Back ICRA 2024

STNet: Spatio-Temporal Fusion-Based Self-Attention for Slip Detection in Visuo-Tactile Sensors

Jin Lu, Bangyan Niu, Huan Ma, Zhu Jiafeng, Jingjing Ji

PDF

Abstract

Slip detection plays a pivotal role in the dexterity of robotics, improving the reliability and precision of manipulations but also contributing to safety, efficiency, and adaptability. Deep learning-based slip detection algorithms commonly difficult to concentrate on key features when faced with dense 3D shape data obtained by visuo-tactile sensors. Data from noncontact locations can interfere with slip judgements and the ignorance of interframe linkage can also lead to slip detection failure. In this paper, a new spatio-temporal sequences fusion-based self-attention, STNet, is proposed to perform slip detection by allocating more attention to the object-sensor contact area when processing complex 3D shape data. A binocular visuo-tactile system (BVTS) is designed and fabricated for dataset construction. The entire 3D shape dataset containing 4 motion patterns, including stationary, pressing, rolling and slipping. Self-attention architecture with and without spatio-temporal sequences fusion mechanism (denoted as STNet and TemNet, respectively) are trained based on the same dataset. The experiments show the validity of STNet, which can reach 98.91% slip detection accuracy. Meanwhile, the ablation studies confirm the effectiveness of the spatio-temporal sequences fusion mechanism.

Index terms

Deep Learning for Visual Perception Force and Tactile Sensing Perception for Grasping and Manipulation