LLM-Handover: Exploiting LLMs for Task-Oriented Robot-Human Handovers
Andreea Roxana Tulbure, René Zurbruegg, Timm Grigat, Marco Hutter
AI summary
Problem
Most existing robot-human handover methods neglect the human’s post-handover task, relying on rigid assumptions that limit generalizability. Additionally, current part segmentation networks often fail to provide reliable spatial context for robotic manipulation.
Approach
The framework processes an RGB-D image and a natural language task description to infer relevant object parts and human grasp regions via an LLM. It then refines part segmentation through spatial reasoning and selects the optimal robot grasp that aligns with the intended post-handover use.
Key results
- LLM-enhanced part segmentation improves detection rates and F1 scores over baselines
- 83% zero-shot success rate in hardware experiments across conventional and unconventional tasks
- 86% user preference for context-aware handovers with reduced regrasp frequency
- Open-source RGB-D dataset of 60 household objects with detailed part annotations
Why it matters
Enables more intuitive and adaptable human-robot collaboration in everyday and industrial settings by bridging semantic task understanding with physical manipulation.
Abstract
Effective human-robot collaboration depends on task-oriented handovers, where robots present objects in ways that support the partner’s intended use. However, many existing approaches neglect the human’s post-handover action, relying on assumptions that limit generalizability. To address this gap, we propose LLM-Handover, a novel framework that integrates large language model (LLM)-based reasoning with part segmen- tation to enable context-aware grasp selection and execution. Given an RGB-D image and a task description, our system infers relevant object parts and selects grasps that optimize post-handover usability. To support evaluation, we introduce a new dataset of 60 household objects spanning 12 categories, each annotated with detailed part labels. We first demonstrate that our approach improves the performance of the used state- of-the-art part segmentation method, in the context of robot- human handovers. Next, we show that LLM-Handover achieves higher grasp success rates and adapts better to post-handover task constraints. During hardware experiments, we achieve a success rate of 83% in a zero-shot setting over conventional and unconventional post-handover tasks. Finally, our user study underlines that our method enables more intuitive, context- aware handovers, with participants preferring it in 86% of cases.