Proposal and Demonstration of a Robot Behavior Planning System Utilizing Video with Open Source Models in Real-World Environments
Yuki Akutsu, Takahiro Yoshida, Yuki Kato, Yuichiro Sueoka, Koichi Osuka
Abstract
In the field of robotics, researches have sought to control robots capable of dealing with a variety of envi- ronments and tasks generically, through the use of foundation models. Among these, the systems for robot behavior planning utilizing video have also been proposed. The system enables the generation of robot behaviors that are not dependent on specific environments or tasks. This is achieved by generating videos based on text input, which utilizes the vast knowledge inherent in the foundation models. Also, by using a visual interface such as video, it is possible to confirm the behavioral indicators on which the robot is operating. Although, there are few examples of research on robot behavior planning utilizing video. Previous studies have emphasized the verification of behavior generation utilizing video, with simplified object manipulation for testing on simulations. This is not enough to demonstrate the usefulness of robot behavior planning utilizing video in real- world environments. In addition, the systems from previous studies are not open, and such systems have not been sufficiently discussed. This paper attempts to construct robot behavior planning utilizing video as an open system, and to verify the validity of the behavior planning using actual machines. In this paper, we first focus on using Robotis’s TURTLEBOT3 Waffle Pi and Mobile Manipulator(referred to as ”Waffle”) to construct robot behavior planning system utilizing video. Second, we create planning videos targeting the pick-and- place motion using the proposed system, and control the arm part of Waffle in the actual machine verification. Finally, by comparing the target coordinates from the planning video with the coordinates observed from the actual machine, we can confirm whether it is possible to control Waffle as planned. Errors are calculated from the coordinate comparison, and the control is performed again. Based on the results, we verify whether the proposed system is useful for controlling robots in real-world environments.