Behavior-Actor: Behavioral Decomposition and Efficient-Training for Robotic Manipulation
Wenyi Jiang, Baowei Xv, Zhihao Cui
Abstract
Language-conditioned Robotic manipulation demonstrates great potential in tackling various tasks. However, the generalization ability of this technique to unseen commands remains a challenge. Moreover, existing methods suffer from the burdensome overhead of data collection costs. Nowadays, Large language models (LLMs) have demonstrated impressive natural language understanding capabilities. In this work, we propose a novel scheme called Behavior-Actor(BehAct), which leverages the power of LLM to decompose language commands into executable behaviors in Retrieval-Augmented Generation(RAG) manner. A End-to-End actor is then trained to execute these identified behaviors. BehAct’s LLM acts as a “brain”, while the actor acts as a “hand”. A single actor model is trained from scratch on 11 real-world tasks, 40 behaviors using 276 demonstrations, only 7 for each behavior in average. We achieve a 68% average success rate on seen commands, which aligns comparably with recent works. Moreover, BehAct exhibits an impressive 45% average success rate on unseen commands, doubling the performance of the baseline approach. In the BehAct system, LLM-agnostic design enables flexibility in leveraging advanced LLMs without necessitating fine-tuning. Our code has been made publicly available here.