← Back ICRA 2026

Diffusion Stabilizer Policy for Automated Surgical Robot Manipulations

Chon-Lam Ho, Jianshu Hu, lei song, Hesheng Wang, Qi Dou, Yutong Ban

PDF

AI summary

Key figure (auto-extracted from paper)

A diffusion-based filtering framework enables robust surgical robot training by safely leveraging imperfect, perturbed, or failed demonstration data.

Diffusion policy Surgical robotics Imitation learning Perturbed data Robot manipulation Data filtering

Problem

Automating surgical robot tasks lags behind household manipulation due to a reliance on high-quality data, while real-world demonstrations inevitably contain noise or failures that degrade standard diffusion policies.

Approach

The method trains a diffusion stabilizer on clean data first, then continuously updates it using a mixed batch of clean and perturbed data filtered by prediction error to discard low-quality transitions.

Key results

31% average success rate gain under action-level perturbations
28% average success rate gain under trajectory-level perturbations
Outperforms standard diffusion and imitation learning baselines across complex surgical tasks
Validated successfully on real-world dVRK robot experiments with imperfect data

Why it matters

Provides a practical pathway to scale data-driven surgical automation by making training robust to the noisy, imperfect demonstrations inherent in real-world data collection.

Abstract

Intelligent surgical robots have the potential to revolutionize clinical practice by enabling more precise and automated surgical procedures. However, the automation of such robot for surgical tasks remains under-explored compared to recent advancements in solving household manipulation tasks. These successes have been largely driven by (1) advanced models, such as transformers and diffusion models, and (2) large-scale data utilization. Aiming to extend these successes to the domain of surgical robotics, we propose a diffusion- based policy learning framework, called Diffusion Stabilizer Policy (DSP), which enables training with imperfect, perturbed or even failed trajectories. Our approach consists of two stages: first, we train the diffusion stabilizer policy using only clean data. Then, the policy is continuously updated using a mixture of clean and perturbed data, with filtering based on the prediction error on actions. Comprehensive experiments conducted in both simulation and real-world demonstrate the superior performance of our method under different types of perturbations.

Index terms

Surgical Robotics: Planning Surgical Robotics: Laparoscopy Learning from Demonstration