RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination
Dianye Huang, Ziping Cong, Nassir Navab, Zhongliang Jiang
AI summary
Problem
Existing robotic ultrasound systems rely on opaque black-box models or rigid rule-based policies, lacking clinical transparency and requiring prohibitively large training datasets. This opacity and data dependency hinder safe, widespread clinical adoption.
Approach
The authors introduce RAG-RUSS, which couples a vision-language model with a retrieval engine to fetch similar past scan contexts. This allows the system to interpret current ultrasound images, explain the active scanning stage, and predict the next probe motion without relying on massive supervised datasets.
Key results
- Successfully identifies current scanning stages and generates anatomical explanations
- Autonomously plans and executes probe motions across transverse and longitudinal planes
- Trained on 28 volunteers' data with strong generalization to 4 unseen volunteers
- First robotic ultrasound framework to unify perception, explanation, and action for carotid exams
Why it matters
Provides a transparent, data-efficient pathway for deploying trustworthy autonomous robotic ultrasound systems in clinical practice.
Abstract
Robotic ultrasound (US) has recently attracted increasing attention as a means to overcome the limitations of conventional US examinations, such as the strong operator dependence. However, the decision-making process of existing methods is often either rule-based or relies on end-to-end learning models that operate as black boxes. This has been seen as a main limit for clinical acceptance and raises safety concerns for widespread adoption in routine practice. To tackle this challenge, we introduce the RAG-RUSS, an interpretable framework capable of performing a full carotid examination in accordance with the clinical workflow while explicitly ex- plaining both the current stage and the next planned action. Furthermore, given the scarcity of medical data, we incorporate retrieval-augmented generation to enhance generalization and reduce dependence on large-scale training datasets. The method was trained on data acquired from 28 volunteers, while an ad- ditional four volumetric scans recorded from previously unseen volunteers were reserved for testing. The results demonstrate that the method can explain the current scanning stage and autonomously plan probe motions to complete the carotid examination, encompassing both transverse and longitudinal planes. Code: https://github.com/congzp/USrobot