Multi-Task Real-Robot Data with Gaze Attention for Dual-Arm Fine Manipulation
Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi
Abstract
Deep imitation learning is a promising approach in robotic manipulation, enabling robots to acquire versatile and adaptable skills. In such research, by learning various tasks, robots achieved generality across multiple objects. However, such multi-task robot datasets have mainly focused on single- arm tasks that are relatively imprecise and not addressed the fine-grained object manipulation that robots are expected to perform in the real world. In this study, we introduce a dataset for diverse object manipulation that includes dual-arm tasks and/or tasks that require fine manipulation. We generated a dataset containing 224k episodes (150 hours, 1,104 language instructions) that includes dual-arm fine tasks, such as bowl- moving, pencil-case opening, and banana-peeling. This dataset is publicly available 1. Additionally, this dataset includes visual attention signals, dual-action labels that separate actions into robust reaching trajectories or precise interactions with objects, and language instructions, all aimed at achieving robust and precise object manipulation. We applied the dataset to our Dual-Action and Attention, which is a model that we designed for fine-grained dual-arm manipulation tasks that is robust to covariate shift. We tested the model in over 7k trials for real robot manipulation tasks, which demonstrated its capability to perform fine manipulation.