← Back ICRA 2026

SurgSync: Time-Synchronized Multi-Modal Data Collection Framework and Dataset for Surgical Robotics

Haoying Zhou, Chang Liu, Yimeng Wu, Junlin Wu, Zijian Wu, Yu Chung Lee, Sara Martuscelli, Septimiu E. Salcudean, Gregory Scott Fischer, Peter Kazanzides

PDF

AI summary

Key figure (auto-extracted from paper)

SurgSync delivers a high-fidelity, time-synchronized multi-modal dataset and framework that directly addresses critical data alignment and imaging gaps in surgical robotics AI research.

surgical robotics multi-modal data time synchronization dataset AI training dVRK

Problem

Existing surgical robotics datasets lack precise time alignment across modalities, suffer from outdated imaging pipelines, and cover limited tasks, which hinders the development of robust AI models for surgery.

Approach

The authors built SurgSync, an open-source framework that combines dual-mode synchronized recorders, a modern chip-on-tip stereo endoscope, and a custom capacitive contact sensor to collect and process temporally aligned visual, kinematic, and tactile data on the dVRK platform.

Key results

Dual-mode synchronized recorders for precise temporal alignment
Modern stereo endoscope achieving >30× higher image sharpness
Capacitive contact sensor providing tool-tissue ground truth with up to 99.1% accuracy
214 validated multi-modal recordings across canonical surgical training tasks

Why it matters

Provides the surgical robotics and AI communities with a high-quality, open-source resource to train and evaluate perception, skill assessment, and autonomy algorithms.

Abstract

Most existing robotic surgery systems adopt a human-in-the-loop paradigm, often with the surgeon directly teleoperating the robotic system. Adding intelligence to these robots would enable higher-level control, such as supervised autonomy or even full autonomy. However, artificial intelligence (AI) requires large amounts of training data, which is currently lacking. This work proposes SurgSync, a multi-modal data collection framework with offline and online synchronization to support training and real-time inference, respectively. The framework is implemented on a da Vinci Research Kit (dVRK) and introduces (1) dual-mode (online/offline-matching) synchro- nized recorders, (2) a modern stereo endoscope to achieve image quality on par with clinical systems, and (3) additional sensors such as a side-view camera and a novel capacitive contact sensor to provide ground truth contact data. The framework also incorporates a post-processing toolbox for tasks such as depth estimation, optical flow, and a practical kinematic reprojection method using Gaussian heatmap. User studies with participants of varying skill levels are performed with ex-vivo tissue to provide clinically realistic data, and a network for surgical skill assessment is employed to demonstrate utilization of the collected data. Through the user study experiments, we obtained a dataset of 214 validated instances across multiple canonical training tasks. All software and data are available at surgsync.github.io.

Index terms

Medical Robots and Systems Surgical Robotics: Laparoscopy Data Sets for Robot Learning