Adaptive Tidying Robots: Learning from Interaction and Observation
Ryuichi Maeda, Adel Baselizadeh, Shin Watanabe, Ryo Kurazume, Jim Torresen
Abstract
Designing service robots capable of tidying up in unfamiliar and dynamic human environments presents a significant challenge. Such robots must not only recognize and manipulate a wide range of objects but also align their actions with tidying up rules, which may vary greatly from one individual to another. To address these challenges, we propose a comprehensive software framework that integrates Large Language Model (LLM) and Vision-Language Models (VLMs) for service robots. Our framework enables robots to learn human-specific tidying up rules through interaction and observation, and to identify and handle previously unseen objects and receptacles. This adaptive framework offers a unified solution for recognizing, learning, and acting upon diverse and dynamic human environments. We evaluate our framework using both a text-based benchmark dataset to assess tidying up rule learning and a simulated environment to demonstrate practical tidying up performance. In the evaluation using the text-based benchmark dataset, our framework selects appropriate receptacles for unseen objects with high accuracy (87.4%), including unseen receptacle categories. The simulation evaluation confirms the effectiveness of our framework in realistic environments and scenarios. This research advances the field of service robotics by presenting an integrated software solution that leverages LLM and VLMs for more personalized and adaptable robot behavior in real-world tasks.