SurgSync: Time-Synchronized Multi-Modal Data Collection Framework and Dataset for Surgical Robotics
Haoying Zhou, Chang Liu, Yimeng Wu, Junlin Wu, Zijian Wu, Yu Chung Lee, Sara Martuscelli, Septimiu E. Salcudean, Gregory Scott Fischer, Peter Kazanzides
AI summary
Problem
Existing surgical robotics datasets lack precise time alignment across modalities, suffer from outdated imaging pipelines, and cover limited tasks, which hinders the development of robust AI models for surgery.
Approach
The authors built SurgSync, an open-source framework that combines dual-mode synchronized recorders, a modern chip-on-tip stereo endoscope, and a custom capacitive contact sensor to collect and process temporally aligned visual, kinematic, and tactile data on the dVRK platform.
Key results
- Dual-mode synchronized recorders for precise temporal alignment
- Modern stereo endoscope achieving >30× higher image sharpness
- Capacitive contact sensor providing tool-tissue ground truth with up to 99.1% accuracy
- 214 validated multi-modal recordings across canonical surgical training tasks
Why it matters
Provides the surgical robotics and AI communities with a high-quality, open-source resource to train and evaluate perception, skill assessment, and autonomy algorithms.
Abstract
Most existing robotic surgery systems adopt a human-in-the-loop paradigm, often with the surgeon directly teleoperating the robotic system. Adding intelligence to these robots would enable higher-level control, such as supervised autonomy or even full autonomy. However, artificial intelligence (AI) requires large amounts of training data, which is currently lacking. This work proposes SurgSync, a multi-modal data collection framework with offline and online synchronization to support training and real-time inference, respectively. The framework is implemented on a da Vinci Research Kit (dVRK) and introduces (1) dual-mode (online/offline-matching) synchro- nized recorders, (2) a modern stereo endoscope to achieve image quality on par with clinical systems, and (3) additional sensors such as a side-view camera and a novel capacitive contact sensor to provide ground truth contact data. The framework also incorporates a post-processing toolbox for tasks such as depth estimation, optical flow, and a practical kinematic reprojection method using Gaussian heatmap. User studies with participants of varying skill levels are performed with ex-vivo tissue to provide clinically realistic data, and a network for surgical skill assessment is employed to demonstrate utilization of the collected data. Through the user study experiments, we obtained a dataset of 214 validated instances across multiple canonical training tasks. All software and data are available at surgsync.github.io.