← Back ICRA 2026

RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination

Dianye Huang, Ziping Cong, Nassir Navab, Zhongliang Jiang

PDF

AI summary

Key figure (auto-extracted from paper)

RAG-RUSS enables transparent, data-efficient autonomous carotid ultrasound scanning by using retrieval-augmented vision-language models to explain scanning stages and plan probe motions.

Robotic ultrasound Retrieval-augmented generation Vision-language models Autonomous scanning Carotid examination Interpretable AI

Problem

Existing robotic ultrasound systems rely on opaque black-box models or rigid rule-based policies, lacking clinical transparency and requiring prohibitively large training datasets. This opacity and data dependency hinder safe, widespread clinical adoption.

Approach

The authors introduce RAG-RUSS, which couples a vision-language model with a retrieval engine to fetch similar past scan contexts. This allows the system to interpret current ultrasound images, explain the active scanning stage, and predict the next probe motion without relying on massive supervised datasets.

Key results

Successfully identifies current scanning stages and generates anatomical explanations
Autonomously plans and executes probe motions across transverse and longitudinal planes
Trained on 28 volunteers' data with strong generalization to 4 unseen volunteers
First robotic ultrasound framework to unify perception, explanation, and action for carotid exams

Why it matters

Provides a transparent, data-efficient pathway for deploying trustworthy autonomous robotic ultrasound systems in clinical practice.

Abstract

Robotic ultrasound (US) has recently attracted increasing attention as a means to overcome the limitations of conventional US examinations, such as the strong operator dependence. However, the decision-making process of existing methods is often either rule-based or relies on end-to-end learning models that operate as black boxes. This has been seen as a main limit for clinical acceptance and raises safety concerns for widespread adoption in routine practice. To tackle this challenge, we introduce the RAG-RUSS, an interpretable framework capable of performing a full carotid examination in accordance with the clinical workflow while explicitly ex- plaining both the current stage and the next planned action. Furthermore, given the scarcity of medical data, we incorporate retrieval-augmented generation to enhance generalization and reduce dependence on large-scale training datasets. The method was trained on data acquired from 28 volunteers, while an ad- ditional four volumetric scans recorded from previously unseen volunteers were reserved for testing. The results demonstrate that the method can explain the current scanning stage and autonomously plan probe motions to complete the carotid examination, encompassing both transverse and longitudinal planes. Code: https://github.com/congzp/USrobot

Index terms

Medical Robots and Systems Computer Vision for Medical Robotics