An Offline Learning of Behavior Correction Policy for Vision-Based Robotic Manipulation
Qingxiuxiong Dong, Toshimitsu Kaneko, Masashi Sugiyama
Abstract
Offline learning usually requires a large dataset for training. In this paper, we focus on vision-based robotic manipulation tasks and utilize certain task properties to achieve offline learning with a small dataset. We propose a two-stage agent consisting of a tentative decision stage and a correction stage, where the tentative decision stage determines a tentative action from the original camera image, and the correction stage determines a correction to the tentative action based on the cropped image according to the tentative action. The correction stage utilizes task properties to obtain the cropped image with task-relevant features, enabling efficient correction. In particular, the training of the two stages can be performed individually, which enables a straightforward application of general offline learning algorithms. We conduct experiments by combining the two-stage agent with conventional offline reinforcement learning and imitation learning algorithms. In both cases, we benchmark the proposed method using RLBench and demonstrate that the task performance is significantly improved by the correction stage.