← Back ICRA 2026

DiffDef: A Diffusion Model for Generating Multimodal Goal Shapes from Demonstrations for Deformable Object Manipulation

Bao Thach, Tanner Watts, Siyeon Kim, Britton Jordan, Mohanraj Devendran Shanthi, Shing-Hei Ho, James Ferguson, Tucker Hermans, Alan Kuntz

PDF

AI summary

Key figure (auto-extracted from paper)

DiffDef leverages a conditional diffusion model to generate diverse, multimodal goal shapes from demonstrations, significantly outperforming deterministic methods in deformable object manipulation.

Deformable manipulation Diffusion models Multimodal goal generation Shape servoing Robotic surgery Learning from demonstrations

Problem

Prior deformable manipulation methods rely on manual goal specification or deterministic models that collapse multimodal success strategies into single, often infeasible averages.

Approach

DiffDef trains a conditional diffusion model on demonstration data to learn and sample from a distribution of valid goal shapes, conditioned on the current object state and task context.

Key results

Captures bimodal goal distributions without mode-averaging artifacts
Achieves >95% success rate with only 10 human demonstrations in simulation
Outperforms DefGoalNet across surgical and manufacturing tasks
Reaches 100% retraction success on physical dVRK surgical robot

Why it matters

Provides a sample-efficient, robust framework for controlling deformable objects in safety-critical robotic applications like surgery and manufacturing.

Abstract

Deformable object manipulation is a key capa- bility in many robotic applications. A promising paradigm for this problem is shape servoing, which aims to control de- formable objects toward desired goal shapes. However, existing approaches typically rely on impractical goal-shape acquisition methods, such as domain-knowledge engineering or manual manipulation. Moreover, prior methods generally assume a single deterministic goal and fail to handle multimodal goal settings, a common scenario in many real-world tasks where multiple distinct goal shapes can all lead to successful task completion. In this paper, we introduce DiffDef, a novel neural network that uses a diffusion model to learn a distribution of feasible goal shapes rather than predicting a single deterministic outcome. This allows DiffDef to generate diverse goal config- urations while avoiding the mode-averaging artifacts common in deterministic predictors. We evaluate our method on several deformable manipulation tasks inspired by manufacturing and surgical applications, both in simulation and on two physical robotic platforms: the da Vinci Research Kit (dVRK) and a bimanual KUKA-based robotic system. The results demonstrate that DiffDef effectively captures multimodal goal distributions and significantly improves task performance in practical robotic settings. Website: sites.google.com/view/diffdef.

Index terms

Surgical Robotics: Planning Learning from Demonstration Bimanual Manipulation