DiffDef: A Diffusion Model for Generating Multimodal Goal Shapes from Demonstrations for Deformable Object Manipulation
Bao Thach, Tanner Watts, Siyeon Kim, Britton Jordan, Mohanraj Devendran Shanthi, Shing-Hei Ho, James Ferguson, Tucker Hermans, Alan Kuntz
AI summary
Problem
Prior deformable manipulation methods rely on manual goal specification or deterministic models that collapse multimodal success strategies into single, often infeasible averages.
Approach
DiffDef trains a conditional diffusion model on demonstration data to learn and sample from a distribution of valid goal shapes, conditioned on the current object state and task context.
Key results
- Captures bimodal goal distributions without mode-averaging artifacts
- Achieves >95% success rate with only 10 human demonstrations in simulation
- Outperforms DefGoalNet across surgical and manufacturing tasks
- Reaches 100% retraction success on physical dVRK surgical robot
Why it matters
Provides a sample-efficient, robust framework for controlling deformable objects in safety-critical robotic applications like surgery and manufacturing.
Abstract
Deformable object manipulation is a key capa- bility in many robotic applications. A promising paradigm for this problem is shape servoing, which aims to control de- formable objects toward desired goal shapes. However, existing approaches typically rely on impractical goal-shape acquisition methods, such as domain-knowledge engineering or manual manipulation. Moreover, prior methods generally assume a single deterministic goal and fail to handle multimodal goal settings, a common scenario in many real-world tasks where multiple distinct goal shapes can all lead to successful task completion. In this paper, we introduce DiffDef, a novel neural network that uses a diffusion model to learn a distribution of feasible goal shapes rather than predicting a single deterministic outcome. This allows DiffDef to generate diverse goal config- urations while avoiding the mode-averaging artifacts common in deterministic predictors. We evaluate our method on several deformable manipulation tasks inspired by manufacturing and surgical applications, both in simulation and on two physical robotic platforms: the da Vinci Research Kit (dVRK) and a bimanual KUKA-based robotic system. The results demonstrate that DiffDef effectively captures multimodal goal distributions and significantly improves task performance in practical robotic settings. Website: sites.google.com/view/diffdef.