← Back ICRA 2026

Language-Guided Dexterous Functional Grasping by LLM Generated Grasp Functionality and Synergy for Humanoid Manipulation

Caldwell and Fei Chen, Senior

PDF

AI summary

Key figure (auto-extracted from paper)

SayFuncGrasp leverages LLMs and hand synergies to synthesize versatile, language-guided functional grasps for humanoid robots, achieving over 70% manipulation success in real-world tests.

Dexterous functional grasping Language-guided manipulation Large language models Hand synergies Humanoid robotics

Problem

Current dexterous grasping methods rely on visual input and pre-defined concepts, struggling to understand language instructions, optimize high-degree-of-freedom hands, and generalize to novel manipulation tasks.

Approach

The framework uses an LLM to infer grasp functionality from text, grounds it visually, and generates fine grasp configurations via a low-dimensional hand synergy space rather than optimizing all joints individually.

Key results

LLM-driven inference of grasp functionality from natural language
Synergy-based policy for efficient fine grasp synthesis
64.66% grasp and 70.41% manipulation success rates on real robots
Significant improvement in open-set grasp functionality generalization

Why it matters

Enables humanoid robots to execute complex, task-specific manipulations via verbal commands, significantly boosting adaptability and reducing programming overhead for industrial and service applications.

Abstract

Dexterous Functional Grasping (DFG) is the crucial first step for humanoid robots to perform generalized ma- nipulation tasks. However, enabling robots to learn language- guided DFG skills in real-world environments presents several challenges, including comprehending the complex relationship between task instructions and grasp functionality, generating feasible functional grasps of dexterous hands, and handling generalization for novel functional concepts. To tackle these chal- lenges, we introduce SayFuncGrasp, a Large Language Model (LLM) based DFG framework that can synthesize versatile dex- terous functional grasps from language instructions and achieve generalization on novel functional concepts. SayFuncGrasp first harnesses the open-ended manipulation knowledge from an LLM to infer grasp functionality based on language instructions. Subsequently, it employs the inferred grasp functionality to syn- thesize plausible DFG actions characterized by hand synergies. Simulation experiments show that SayFuncGrasp significantly outperforms the baseline method in open-set grasp functionality generalization. Real robot experiments demonstrate the effec- tiveness and generalizability of SayFuncGrasp for interactive humanoid manipulation tasks, achieving an overall grasp success rate of 64.66% and a manipulation success rate of 70.41%. Note to Practitioners—This research was motivated by the practical challenge of enabling humanoid robots with high- DoF dexterous hands to perform functional grasping based on verbal instructions. In industrial settings, such capabilities can significantly enhance the versatility and adaptability of humanoid assistants, allowing them to perform complex manipulations simply by being told what to do, thereby reducing program- ming complexity and increasing flexibility. Current dexterous functional grasping methods rely solely on visual input, without the ability to process language instructions. Furthermore, they are restricted to pre-defined functional concepts and cannot be generalized to novel object classes and manipulation tasks within natural language. Our newly proposed language-guided dexterous functional grasping system takes advantage of open- ended manipulation knowledge from LLMs to produce gener- alized functional grasps of dexterous robot hands according to *This work is supported in part by the Research Grants Council of the Government of the Hong Kong SAR via the Grant 24209021, 14213324, C7100-22GF and in part by the InnoHK of the Government of the Hong Kong SAR via the Hong Kong Centre for Logistics Robotics. (Corresponding author: Fei Chen.) Zhuo Li, Junjia Liu, Zhihao Li, Zhipeng Dong, Tao Teng and Fei Chen are with the Department of Mechanical and Automation Engi- neering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong (e-mail: zli@mae.cuhk.edu.hk; jjliu@mae.cuhk.edu.hk; zhihaoli@mae.cuhk.edu.hk; zhipengdongneu@gmail.com; tao.teng@ieee.org; f.chen@ieee.org). Yongsheng Ou is with the Department of Control Science and Engineering, Dalian University of Technology, Dalian, China (e-mail: yoo2023@dlut.edu.cn). Darwin Caldwell is with the Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy (e-mail: darwin.caldwell@iit.it). verbal commands. Our experiment results demonstrate improved versatility and generalizability compared to the state-of-the-art.

Index terms

Dexterous Manipulation Deep Learning in Grasping and Manipulation