Research Analyzer
← Back ICRA 2024

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

MinJie Zhu, Yichen Zhu, jinming LI, junjie Wen, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang

PDF

Abstract

The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple “pick-and-place” to tasks requiring intent recognition and visual reasoning. Inspired by the dual-process theory in cognitive science—which suggests two parallel systems of fast and slow thinking in human decision-making—we intro- duce Robotics with Fast and Slow Thinking (RFST), a framework that mimics human cognitive architecture to classify tasks and makes decisions on two systems based on instruction types. Our RFST consists of two key components: 1) an instruction discriminator to determine which system should be activated based on the current user’s instruction, and 2) a slow-thinking system that is comprised of a fine-tuned vision-language model aligned with the policy networks, which allow the robot to recognize user’s intention or perform reasoning tasks. To assess our methodology, we built a dataset featuring real-world trajec- tories, capturing actions ranging from spontaneous impulses to tasks requiring deliberate contemplation. Our results, both in simulation and real-world scenarios, confirm that our approach adeptly manages intricate tasks that demand intent recognition and reasoning.

Index terms

AI-Enabled Robotics Deep Learning in Grasping and Manipulation Planning Scheduling and Coordination