← Back ICRA 2024

CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting

Peter Schaldenbrand, Gaurav Parmar, Jun-Yan Zhu, James McCann, Jean Oh

PDF

Abstract

Prior robot painting and drawing work, such as FRIDA, has focused on decreasing the sim-to-real gap and expanding input modalities for users, but the interaction with these systems generally exists only in the input stages. To support interactive, human-robot collaborative painting, we introduce the Collaborative FRIDA (CoFRIDA) robot painting framework, which can co-paint by modifying and engaging with content already painted by a human collaborator. To improve text-image alignment–FRIDA’s major weakness–our system uses pre-trained text-to-image models; however, pre- trained models in the context of real-world co-painting do not perform well because they (1) do not understand the constraints and abilities of the robot and (2) cannot perform co-painting without making unrealistic edits to the canvas and overwriting content. We propose a self-supervised fine-tuning procedure that can tackle both issues, allowing the use of pre-trained state- of-the-art text-image alignment models with robots to enable co-painting in the physical world. Our open-source approach, CoFRIDA, creates paintings and drawings that match the input text prompt more clearly than FRIDA, both from a blank can- vas and one with human created work. More generally, our fine- tuning procedure successfully encodes the robot’s constraints and abilities into a foundation model, showcasing promising results as an effective method for reducing sim-to-real gaps. https://pschaldenbrand.github.io/cofrida/

Index terms

Human-Robot Collaboration Art and Entertainment Robotics Deep Learning Methods