StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes
Zhengri Wu, Yiran Wang, yu wen, Zeyu Zhang, Biao wu, Hao Tang
AI summary
Problem
Underwater stereo depth estimation suffers from severe domain shifts and data scarcity, making it difficult to adapt large vision foundation models or fuse monocular priors with fragile stereo correspondences without extensive labeled data.
Approach
The framework uses a LoRA-adapted monocular foundation encoder to generate coarse depth priors, which guide a recurrent GRU-based stereo refinement module trained entirely without dense labels.
Key results
- State-of-the-art zero-shot RMSE of 2.8947 on TartanAir underwater subset
- RMSE of 1.8843 on SQUID dataset with improved threshold accuracy
- Dynamic LoRA strategy for efficient rank selection and adaptation
- UW-StereoDepth-40K synthetic dataset and validated BlueROV2 deployment
Why it matters
Provides a scalable, label-free solution for accurate 3D perception, directly advancing autonomy and safety for underwater robotics and ROV operations.
Abstract
Underwater stereo depth estimation provides ac- curate 3D geometry for robotics tasks such as navigation, inspection, and mapping, offering metric depth from low-cost passive cameras while avoiding the scale ambiguity of monoc- ular methods. However, existing approaches face two critical challenges: (i) parameter-efficiently adapting large vision foun- dation encoders to the underwater domain without extensive labeled data, and (ii) tightly fusing globally coherent but scale- ambiguous monocular priors with locally metric yet photo- metrically fragile stereo correspondences. To address these challenges, we propose StereoAdapter, a parameter-efficient self-supervised framework that integrates a LoRA-adapted monocular foundation encoder with a recurrent stereo refine- ment module. We further introduce dynamic LoRA adaptation for efficient rank selection and pre-training on the synthetic UW-StereoDepth-40K dataset to enhance robustness under diverse underwater conditions. Comprehensive evaluations on both simulated and real-world benchmarks show improvements of 6.11% on TartanAir and 5.12% on SQUID compared to state-of-the-art methods, while real-world deployment with the BlueROV2 robot further demonstrates the consistent robustness of our approach.