A Self-Attention Multi-Task Learning Model for Garment Segmentation and Parts Recognition
Yilin Zhang, Alberto ElÃas Petrilli BarcelÃ3, Naoya Chiba, Koichi Hashimoto
Abstract
The integration of robotics in the garment indus- try remains relatively limited, primarily due to the challenges in the highly deformable nature of garments. This study thus explores a vision-based garment and garment parts recogni- tion model to facilitate the application of robots in garment manipulation. The main objective is to detect and segment each garment piece from a random table and provide multi- dimensional information on it, as well as recognize garment parts such as collar to facilitate proposing grasping points for various robotic tasks. In order to achieve this goal, an MTL (Multi-Task Learning) model based on YOLOv8 and HyCTAS’s self-attention head was processed. Transfer learning was applied and the model was fine-tuned and tested on a self- collected dataset as well as an open-source garment dataset Fashionpedia. Experiment results demonstrate that this MTL model is able to substantially improve the processing speed while having a minimal decrease in mask average precision for each integrated vision task. And while this performance preservation is mainly attributed to the HyCTAS implementa- tion, further enhancements can be achieved by adding auxiliary tasks and loading weights from single tasks.