Can Policy Learning with Time Limits Be Used for Contact-Rich Industrial Manipulation?
Bharat Singh, Jack Kilpatrick, Sebastian Joya-Paez, Ryo Hanai, Ixchel G. Ramirez-Alpizar, Natsuki Yamanobe, Yukiyasu Domae
Abstract
Contact-rich industrial manipulation poses a sig- nificant challenge for reinforcement learning policies, requiring dexterous interactions with objects exhibiting complex contact dynamics. Additionally, in industrial applications, completion deadlines are equally important to task success, for integration into wider processing pipelines. Further, in the standard re- inforcement learning setting, failure to account for remaining time in episodic tasks can result in state aliasing or inconsistent temporal difference errors, therefore, this research work seeks to determine the most effective integration of time limits in policy learning. We propose that the remaining time be used as both an input and a scaler for the task success reward, demonstrating the effectiveness for the dexterous unscrewing of a nut from a bolt. The resulting time-based policy completes the unscrewing task with a success rate of 90% in 10 simulated trials, the highest of all approaches considered, including a standard baseline. It takes an average completion time of 21.67 seconds across the trials, given a 35 second time limit, which, while not the fasted method considered, may indicate more stable motion resulting from awareness of the time limit. Finally, the efficacy of the learned unscrewing policy is validated on a real UR5e manipulator for the nut-bolt disassembly task.