← Back ICRA 2026

OCT Imaging for Pose Estimation and Feedback Control of an Articulated Magnetic Surgical Tool

Erik Fredin, Nirmal Pol, Anton Zaliznyi, Dmytro Fishman, Eric D. Diller, Lueder Alexander Kahrs

PDF

AI summary

Key figure (auto-extracted from paper)

A sparse deep learning model adapted from LiDAR enables real-time, high-accuracy pose estimation of articulated magnetic surgical tools from OCT scans, enabling robust closed-loop feedback control.

Magnetic surgical tools OCT imaging pose estimation deep learning sparse CNN feedback control

Problem

Safe control of miniaturized, multi-jointed magnetic surgical tools requires real-time 3D pose feedback, but existing OCT-based methods are too slow, limited to rigid needles, or require physical markers.

Approach

The authors benchmark eight deep learning models on a novel dataset of volumetric OCT scans containing a markerless articulated magnetic gripper under realistic surgical artifacts, adapting the LiDAR-based VoxelNeXt architecture for sparse 3D keypoint detection.

Key results

VoxelNeXt achieves 0.6 mm positional and 5° angular accuracy with 40 ms inference time
Sparse CNNs outperform dense architectures for markerless pose estimation
Closed-loop PID control successfully demonstrated with real-time OCT feedback
Novel 8-DoF volumetric OCT dataset released with realistic surgical artifacts

Why it matters

Enables real-time, markerless pose feedback for miniaturized magnetic surgical robots, advancing safe and precise minimally invasive neurosurgery.

Abstract

Magnetically-driven surgical tools are a new class of millimetre-scale devices that could enable procedures such as minimally invasive neurosurgery due to their high dexterity at a small size. However, safe and effective control of these magnetic tools necessitates real-time observation of tool joint angles, which is challenging inside a surgical environment. Optical coherence tomography (OCT) is an emerging volumetric imaging technique offering 3D visualization of tissue and tools simultaneously, which we explore for joint angle estimation. While some previous studies have used OCT for estimating the pose of rigid instruments, those methods are specific to needle-like tools, and often have slow processing speed. In this work, we benchmark eight deep- learning models adapted from other 3D modalities to OCT data showing magnetic tools in a mock surgical environment. The models are tested in the presence of other objects, occlusion, noise, and the tool being partially outside of the OCT’s field of view. The best performing model, VoxelNeXt, is adapted from 3D object detection in LiDAR scans, the first time a model of this kind is used on medical data. It infers tool pose with 0.6 mm position and 5◦angular errors, with 40 ms inference time. We use this model to provide feedback for controlling a multi-jointed magnetic tool, demonstrating the robustness of OCT-based feedback control. Code and dataset are available at https://medcvr.utm.utoronto.ca/ral2025-oct-pose.html.

Index terms

Computer Vision for Medical Robotics Machine Learning for Robot Control Deep Learning for Visual Perception