Learning Optimal Strategies for Needle Handover in Surgical Suturing
Cholin Kim, Jeonghyeon Yoon, Hakyun Lee, Sihyeoung Park, Hyojae Park, Seungjun Lee, Michael Yip, Minho Hwang
AI summary
Problem
Automating surgical needle handover is hindered by combinatorial grasping strategies, pose estimation uncertainties, and cable-driven robot inaccuracies, making it difficult to minimize unnecessary handovers and maintain reliability.
Approach
The authors formulate needle pickup and handover as a goal-oriented reinforcement learning task, training a DQN agent with simulated kinematic disturbances to learn optimal grasping sequences that minimize handovers while respecting joint limits.
Key results
- First framework unifying pickup and handover planning for target grasping
- DQN policy trained with joint margin rewards and disturbance modeling for robustness
- Achieved human-level handover efficiency (1.65 ± 0.50 vs. 1.62 ± 0.55 attempts) on dVRK
- Improved consistency and joint-limit avoidance compared to human teleoperation
Why it matters
Demonstrates the viability of reinforcement learning for safe, reliable automation of critical surgical subtasks, paving the way for more autonomous robotic surgery.
Abstract
Automation of suturing subtasks, such as needle handover, has the potential to reduce surgeons’ fatigue and improve surgical efficiency. Needle handover is particularly challenging due to the combinatorial nature of grasping and handover strategies, uncertainties in needle pose estimation, and inaccuracies inherent in cable-driven surgical robots such as the da Vinci system. In this work, we present a reinforce- ment learning framework for needle handover, spanning the process from initial pickup to a desired grasping state. We formulate the task as a goal-oriented planning problem and design a state–action representation that captures grasping and handover configurations. A DQN-based policy is trained with disturbances that reflect real-world kinematic errors to ensure robustness. The learned policy was validated on the da Vinci Research Kit (dVRK) and quantitatively compared with human teleoperation. Results demonstrate that our approach achieves human-level efficiency in terms of handover attempts (1.65 ± 0.50 vs. 1.62 ± 0.55), while improving consistency and joint-limit avoidance. The proposed framework demonstrates the potential of reinforcement learning for safe and reliable automation of surgical handover and points to opportunities for extending autonomy to more complex handover scenarios.