← Back ICRA 2026

LLM-Handover: Exploiting LLMs for Task-Oriented Robot-Human Handovers

Andreea Roxana Tulbure, René Zurbruegg, Timm Grigat, Marco Hutter

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating LLM-based reasoning with part segmentation enables robots to select task-aware grasps that significantly improve handover success rates and human preference.

Human-Robot Collaboration Task-Oriented Handovers Large Language Models Part Segmentation Grasp Selection Physical Human-Robot Interaction

Problem

Most existing robot-human handover methods neglect the human’s post-handover task, relying on rigid assumptions that limit generalizability. Additionally, current part segmentation networks often fail to provide reliable spatial context for robotic manipulation.

Approach

The framework processes an RGB-D image and a natural language task description to infer relevant object parts and human grasp regions via an LLM. It then refines part segmentation through spatial reasoning and selects the optimal robot grasp that aligns with the intended post-handover use.

Key results

LLM-enhanced part segmentation improves detection rates and F1 scores over baselines
83% zero-shot success rate in hardware experiments across conventional and unconventional tasks
86% user preference for context-aware handovers with reduced regrasp frequency
Open-source RGB-D dataset of 60 household objects with detailed part annotations

Why it matters

Enables more intuitive and adaptable human-robot collaboration in everyday and industrial settings by bridging semantic task understanding with physical manipulation.

Abstract

Effective human-robot collaboration depends on task-oriented handovers, where robots present objects in ways that support the partner’s intended use. However, many existing approaches neglect the human’s post-handover action, relying on assumptions that limit generalizability. To address this gap, we propose LLM-Handover, a novel framework that integrates large language model (LLM)-based reasoning with part segmen- tation to enable context-aware grasp selection and execution. Given an RGB-D image and a task description, our system infers relevant object parts and selects grasps that optimize post-handover usability. To support evaluation, we introduce a new dataset of 60 household objects spanning 12 categories, each annotated with detailed part labels. We first demonstrate that our approach improves the performance of the used state- of-the-art part segmentation method, in the context of robot- human handovers. Next, we show that LLM-Handover achieves higher grasp success rates and adapts better to post-handover task constraints. During hardware experiments, we achieve a success rate of 83% in a zero-shot setting over conventional and unconventional post-handover tasks. Finally, our user study underlines that our method enables more intuitive, context- aware handovers, with participants preferring it in 86% of cases.

Index terms

Physical Human-Robot Interaction Human-Aware Motion Planning Human-Robot Collaboration