Diffusion Stabilizer Policy for Automated Surgical Robot Manipulations
Chon-Lam Ho, Jianshu Hu, lei song, Hesheng Wang, Qi Dou, Yutong Ban
AI summary
Problem
Automating surgical robot tasks lags behind household manipulation due to a reliance on high-quality data, while real-world demonstrations inevitably contain noise or failures that degrade standard diffusion policies.
Approach
The method trains a diffusion stabilizer on clean data first, then continuously updates it using a mixed batch of clean and perturbed data filtered by prediction error to discard low-quality transitions.
Key results
- 31% average success rate gain under action-level perturbations
- 28% average success rate gain under trajectory-level perturbations
- Outperforms standard diffusion and imitation learning baselines across complex surgical tasks
- Validated successfully on real-world dVRK robot experiments with imperfect data
Why it matters
Provides a practical pathway to scale data-driven surgical automation by making training robust to the noisy, imperfect demonstrations inherent in real-world data collection.
Abstract
Intelligent surgical robots have the potential to revolutionize clinical practice by enabling more precise and automated surgical procedures. However, the automation of such robot for surgical tasks remains under-explored compared to recent advancements in solving household manipulation tasks. These successes have been largely driven by (1) advanced models, such as transformers and diffusion models, and (2) large-scale data utilization. Aiming to extend these successes to the domain of surgical robotics, we propose a diffusion- based policy learning framework, called Diffusion Stabilizer Policy (DSP), which enables training with imperfect, perturbed or even failed trajectories. Our approach consists of two stages: first, we train the diffusion stabilizer policy using only clean data. Then, the policy is continuously updated using a mixture of clean and perturbed data, with filtering based on the prediction error on actions. Comprehensive experiments conducted in both simulation and real-world demonstrate the superior performance of our method under different types of perturbations.