Fast Monocular Depth Estimation for Underwater Robotics Leveraging Attenuation Differences As Supplementary Information
Hao Wang, Liang Lu, Yan Dong, Bin Han
AI summary
Problem
Monocular depth estimation fails in underwater environments due to wavelength-dependent light attenuation, while existing methods either discard RGB data or rely solely on attenuation cues, resulting in poor accuracy and high computational costs.
Approach
The network extracts features from both standard RGB images and an Attenuation Information Space, fuses them using a lightweight FusionMamba module, and refines predictions with a micro Vision Transformer for efficient deployment.
Key results
- First framework to deeply integrate underwater attenuation features with RGB data for depth estimation
- Reduces model parameters by 10% and boosts inference speed by 43% over state-of-the-art fast methods
- Introduces a physics-based Attenuation Prior Loss to explicitly model underwater light absorption
- Delivers superior depth accuracy on USOD10K and FLSea datasets compared to existing lightweight networks
Why it matters
Provides a computationally efficient and accurate depth perception solution for resource-constrained underwater robotic systems.
Abstract
Underwater and in-air environments exhibit dis- tinct imaging characteristics, which should be carefully consid- ered and effectively exploited for accurate depth estimation. In this work, we analyze the effectiveness of wavelength-dependent attenuation for underwater depth estimation and show that it is helpful but insufficient to perform depth estimation inde- pendently. Therefore, we propose a fast underwater monocular depth estimation network that incorporates underwater light absorption difference (ULAD) as supplementary information. Compared with methods that rely solely on RGB input, the proposed approach provides more accurate depth predictions. In our network, RGB and ULAD features are extracted by MobileNetV4 and fused using FusionMamba, followed by decoding and refinement with a micro Vision Transformer. The network is trained on the USOD10K dataset and evaluated on both its test set and the FLSea dataset. Experimental results demonstrate that our method achieves more accurate depth esti- mation and higher efficiency compared with other lightweight networks. Furthermore, Compared with existing state-of-the- art fast underwater depth estimation methods, our network further reduces the number of parameters by 10% and im- proves inference speed by 43%. The source code and pretrained models are available at https://github.com/Sillear/ULAD-Depth