← Back ICRA 2026

DexKnot: Generalizable Visuomotor Policy Learning for Dexterous Bag-Knotting Manipulation

Jiayuan Zhang, Ruihai Wu, Haojun Chen, Yuran Wang, Yifan Zhong, Ceyao Zhang, Yaodong Yang, Yuanpei Chen

PDF

AI summary

Key figure (auto-extracted from paper)

DexKnot enables robots to reliably knot diverse, unseen plastic bags by learning a sparse, shape-agnostic keypoint representation that guides a diffusion policy.

Dexterous manipulation Deformable object manipulation Diffusion policy Keypoint correspondence Generalizable visuomotor learning Real-world robotics

Problem

Robots struggle to generalize knotting tasks across different plastic bag instances and initial deformations due to the high-dimensional, infinite degrees of freedom inherent in highly deformable objects.

Approach

The framework collects real-world manual deformation data to train a PointNet++ encoder for shape-agnostic keypoint correspondence, then uses these identified keypoints as low-dimensional inputs to a diffusion transformer policy for generalizable manipulation.

Key results

High success rates across unseen bag instances and novel deformations
Outperforms DP3 and standard Diffusion Policy on out-of-distribution states
Enables cross-instance and cross-deformation generalization with few demonstrations
Real-world keypoint correspondence pipeline bypasses simulation and heavy annotation

Why it matters

It provides a practical, generalizable framework for dexterous manipulation of highly deformable objects, advancing real-world robotic automation for everyday tasks like waste management and retail.

Abstract

Knotting plastic bags is a common task in daily life, yet it is challenging for robots due to the bags’ in- finite degrees of freedom and complex physical dynamics. Existing methods often struggle in generalization to unseen bag instances or deformations. To address this, we present DexKnot, a framework that combines keypoint affordance with diffusion policy to learn a generalizable bag-knotting policy. Our approach learns a shape-agnostic representation of bags from keypoint correspondence data collected through real- world manual deformation. For an unseen bag configuration, the keypoints can be identified by matching the representa- tion to a reference. These keypoints are then provided to a diffusion transformer, which generates robot action based on a small number of human demonstrations. DexKnot enables effective policy generalization by reducing the dimensionality of observation space into a sparse set of keypoints. Experiments show that DexKnot achieves reliable and consistent knotting performance across a variety of previously unseen instances and deformations.

Index terms

Representation Learning Imitation Learning Dexterous Manipulation