← Back IROS 2024

Force and Velocity Prediction in Human-Robot Collaborative Transportation Tasks through Video Retentive Networks

Jose Enrique Dominguez-Vidal, Alberto Sanfeliu

PDF

Abstract

In this article, we propose a generalization of a Deep Learning State-of-the-Art architecture such as Retentive Networks so that it can accept video sequences as input. With this generalization, we design a force/velocity predictor applied to the medium-distance Human-Robot collaborative object transportation task. We achieve better results than with our previous predictor by reaching success rates in testset of up to 93.7% in predicting the force to be exerted by the human and up to 96.5% in the velocity of the human-robot pair during the next 1 s, and up to 91.0% and 95.0% respectively in real experiments. This new architecture also manages to improve inference times by up to 32.8% with different graphics cards. Finally, an ablation test allows us to detect that one of the input variables used so far, such as the position of the task goal, could be discarded allowing this goal to be chosen dynamically by the human instead of being pre-set.

Index terms

Physical Human-Robot Interaction Deep Learning Methods Intention Recognition