OCT Imaging for Pose Estimation and Feedback Control of an Articulated Magnetic Surgical Tool
Erik Fredin, Nirmal Pol, Anton Zaliznyi, Dmytro Fishman, Eric D. Diller, Lueder Alexander Kahrs
AI summary
Problem
Safe control of miniaturized, multi-jointed magnetic surgical tools requires real-time 3D pose feedback, but existing OCT-based methods are too slow, limited to rigid needles, or require physical markers.
Approach
The authors benchmark eight deep learning models on a novel dataset of volumetric OCT scans containing a markerless articulated magnetic gripper under realistic surgical artifacts, adapting the LiDAR-based VoxelNeXt architecture for sparse 3D keypoint detection.
Key results
- VoxelNeXt achieves 0.6 mm positional and 5° angular accuracy with 40 ms inference time
- Sparse CNNs outperform dense architectures for markerless pose estimation
- Closed-loop PID control successfully demonstrated with real-time OCT feedback
- Novel 8-DoF volumetric OCT dataset released with realistic surgical artifacts
Why it matters
Enables real-time, markerless pose feedback for miniaturized magnetic surgical robots, advancing safe and precise minimally invasive neurosurgery.
Abstract
Magnetically-driven surgical tools are a new class of millimetre-scale devices that could enable procedures such as minimally invasive neurosurgery due to their high dexterity at a small size. However, safe and effective control of these magnetic tools necessitates real-time observation of tool joint angles, which is challenging inside a surgical environment. Optical coherence tomography (OCT) is an emerging volumetric imaging technique offering 3D visualization of tissue and tools simultaneously, which we explore for joint angle estimation. While some previous studies have used OCT for estimating the pose of rigid instruments, those methods are specific to needle-like tools, and often have slow processing speed. In this work, we benchmark eight deep- learning models adapted from other 3D modalities to OCT data showing magnetic tools in a mock surgical environment. The models are tested in the presence of other objects, occlusion, noise, and the tool being partially outside of the OCT’s field of view. The best performing model, VoxelNeXt, is adapted from 3D object detection in LiDAR scans, the first time a model of this kind is used on medical data. It infers tool pose with 0.6 mm position and 5◦angular errors, with 40 ms inference time. We use this model to provide feedback for controlling a multi-jointed magnetic tool, demonstrating the robustness of OCT-based feedback control. Code and dataset are available at https://medcvr.utm.utoronto.ca/ral2025-oct-pose.html.