Multi-Fingered Dragging of Unknown Objects and Orientations Using Distributed Tactile Information through Vision-Transformer and LSTM
Takahisa Ueno, Satoshi Funabashi, Hiroshi Ito, Alexander Schmitz, Shardul Kulkarni, Tetsuya Ogata, Shigeki Sugano
Abstract
Multi-fingered hands can be suitable for stable ob- ject manipulation. Furthermore, abundant tactile information can be acquired with multi-fingered hands, useful to recognize the object’s properties, which is beneficial to adapt the motion to the object. However, generating dexterous manipulation motions with multi-fingered hands with high density tactile sensors is challenging due to complex touch states. Hence, tasks that conventionally require a high level of active tactile sensing simultaneously with motion generation, such as pulling in the hand while recognizing the posture of an object are difficult to accomplish. In this letter, we propose a novel deep predictive learning approach using Vision-Transformer (ViT) and Long- Short Term Memory (LSTM). The ViT’s attention mechanism can spatially focus on specific fingers represented by distributed 3-axis tactile sensors (uSkin). The LSTM can preserve long time-series information of the manipulation which can realize changing the desired motion according to the initial touching position and orientation for the target object. Results showed that the ViT-LSTM is effective in performing adaptive finger movements according to the properties of the object, i.e. its hardness and relative posture.