← Back ICRA 2026

Fast Monocular Depth Estimation for Underwater Robotics Leveraging Attenuation Differences As Supplementary Information

Hao Wang, Liang Lu, Yan Dong, Bin Han

PDF

AI summary

Key figure (auto-extracted from paper)

Fusing RGB imagery with underwater light absorption cues via a lightweight Mamba-based network yields faster and more accurate depth estimation for underwater robotics.

underwater depth estimation monocular depth Mamba network light attenuation robotics feature fusion

Problem

Monocular depth estimation fails in underwater environments due to wavelength-dependent light attenuation, while existing methods either discard RGB data or rely solely on attenuation cues, resulting in poor accuracy and high computational costs.

Approach

The network extracts features from both standard RGB images and an Attenuation Information Space, fuses them using a lightweight FusionMamba module, and refines predictions with a micro Vision Transformer for efficient deployment.

Key results

First framework to deeply integrate underwater attenuation features with RGB data for depth estimation
Reduces model parameters by 10% and boosts inference speed by 43% over state-of-the-art fast methods
Introduces a physics-based Attenuation Prior Loss to explicitly model underwater light absorption
Delivers superior depth accuracy on USOD10K and FLSea datasets compared to existing lightweight networks

Why it matters

Provides a computationally efficient and accurate depth perception solution for resource-constrained underwater robotic systems.

Abstract

Underwater and in-air environments exhibit dis- tinct imaging characteristics, which should be carefully consid- ered and effectively exploited for accurate depth estimation. In this work, we analyze the effectiveness of wavelength-dependent attenuation for underwater depth estimation and show that it is helpful but insufficient to perform depth estimation inde- pendently. Therefore, we propose a fast underwater monocular depth estimation network that incorporates underwater light absorption difference (ULAD) as supplementary information. Compared with methods that rely solely on RGB input, the proposed approach provides more accurate depth predictions. In our network, RGB and ULAD features are extracted by MobileNetV4 and fused using FusionMamba, followed by decoding and refinement with a micro Vision Transformer. The network is trained on the USOD10K dataset and evaluated on both its test set and the FLSea dataset. Experimental results demonstrate that our method achieves more accurate depth esti- mation and higher efficiency compared with other lightweight networks. Furthermore, Compared with existing state-of-the- art fast underwater depth estimation methods, our network further reduces the number of parameters by 10% and im- proves inference speed by 43%. The source code and pretrained models are available at https://github.com/Sillear/ULAD-Depth

Index terms

Computer Vision for Automation RGB-D Perception Deep Learning Methods