Real-Time Millimeter-Accurate Underwater Pose Estimation Via Tightly-Coupled Fusion of Vision and Optical Tracking
Yuer Gao, Tongqing Xu, Yi Cai
AI summary
Problem
Underwater robotic applications require precise, high-frequency localization for agile control, but existing sensors face a fundamental speed-accuracy trade-off, with vision methods drifting over time and high-accuracy optical or acoustic systems lacking sufficient update rates.
Approach
A tightly-coupled Extended Kalman Filter fuses a high-frequency monocular vision pose estimator, augmented with a learned latent dynamics model to compensate for underwater disturbances, with periodic high-accuracy corrections from an external optical tracking system.
Key results
- Achieves 5.65 mm position RMSE at 62 FPS in controlled underwater tests
- Improves accuracy by 1.6× over EfficientPose+EKF baseline and 6.4× over vision-only estimation
- Introduces a neural network-based latent dynamics variable to implicitly compensate for unmodeled hydrodynamic disturbances
- Releases a synchronized underwater localization dataset with video, control inputs, and high-precision optical ground truth
Why it matters
Enables high-fidelity, real-time state estimation critical for validating control algorithms and enabling precise underwater manipulation in laboratory testbeds.
Abstract
Precise and high-frequency state estimation is re- quired for advanced underwater robotic applications such as physical interaction and agile control, yet no single sensor can simultaneously provide both high accuracy and high update rates. Vision-basedmethodsofferhigh-frequencyupdatesbutsufferfrom drift,whileopticaltrackingsystemsarehighlyaccuratebutmaynot provide sufficiently high update rates for real-time control loops. This letter presents a tightly-coupled sensor fusion framework that combines a high-frequency (62 FPS) monocular vision-based pose estimator with a high-accuracy (millimeter-level) optical tracking system. Our approach uses a visual estimator for high-frequency state propagation—with a latent variable motion model to com- pensate for underwater disturbances—while the optical tracker provides periodic corrections. In a controlled underwater testbed, this achieves a position RMSE of 5.65 mm at 62 FPS, improving accuracy1.6×comparedtothebestbaselinemethod(EfficientPose + EKF: 9.20 mm) and 6.4 × compared to vision-only estimation (36 mm). Our dataset and code are available upon request.