Research Analyzer
← Back ICRA 2026

Breaking the Latency Barrier: Synergistic Perception and Control for High-Frequency 3D Ultrasound Servoing

Yizhao Qian, Yujie Zhu, Jiayuan Luo, Li Liu, Yixuan Yuan, Hongen Liao, Guochen Ning

PDF

AI summary

Key figure (auto-extracted from paper)
A tightly coupled perception-control framework enables real-time, high-frequency robotic ultrasound tracking that overcomes traditional latency bottlenecks.
Robotic ultrasound High-frequency control Visual servoing Flow matching Sim-to-real transfer Dynamic tracking

Problem

Existing robotic ultrasound systems cannot track fast-moving anatomical targets in real time due to high end-to-end latency and isolated perception-control design. Current methods either react too slowly or rely on iterative policies that cannot match the 60 Hz update rate of medical ultrasound hardware.

Approach

The authors co-design a Decoupled Dual-Stream Perception Network for rapid 3D state estimation from 2D images and a Single-Step Flow Policy that generates full action sequences in one forward pass, eliminating iterative inference delays.

Key results

  • Achieves over 60 Hz closed-loop control frequency for dynamic tracking
  • Maintains mean tracking error below 6.5 mm for targets moving up to 102 mm/s
  • Successfully re-acquires targets after displacements exceeding 170 mm
  • Enables sample-efficient Sim-to-Real transfer using only 50 expert phantom trajectories

Why it matters

This framework bridges the critical latency gap in robotic ultrasound, paving the way for reliable autonomous imaging during unpredictable patient motion.

Abstract

Tracking moving anatomical targets with robotic ultrasound is particularly challenging when the target motion is both fast and large in scale, as the end-to-end latency of existing systems prevents the perception–control loop from closing fast enough. In this paper, we argue that overcoming this limitation calls for the joint design of perception and control, rather than optimizing each in isolation. We present a tightly-coupled framework with two main components: (1) a Decoupled Dual- Stream Perception Network that estimates 3D translational state from 2D ultrasound images at high frequency, and (2) a Single-Step Flow Policy that outputs an entire action sequence in one forward pass, removing the need for iterative rollouts used in conventional policies. Together, the two modules enable closed-loop control at over 60 Hz. In phantom experiments with complex 3D trajectories, the system achieves a mean tracking error below 6.5 mm and re-acquires the target after resultant displacements exceeding 170 mm. It tracks targets moving at speeds up to 102 mm/s with a terminal error under 1.7 mm. In-vivo trials on a human volunteer further confirm that the approach transfers to realistic clinical conditions. To our knowledge, this is the first RUSS framework to unify high-bandwidth dynamic tracking with large-scale repositioning within a single architecture, offering a concrete step toward autonomous ultrasound operation in the presence of patient motion.

Index terms

Medical Robots and Systems Imitation Learning Computer Vision for Medical Robotics

Related papers