← Back ICRA 2026

Self-Supervised Underwater Monocular Depth Estimation Informed by Multi-Physics Processes

Fengqi Xiao, Juntian Qu

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating underwater optical physics into a self-supervised learning loop significantly improves depth estimation accuracy and generalization without requiring ground-truth data.

Underwater depth estimation Self-supervised learning Optical degradation Inherent optical properties Monocular vision Underwater robotics

Problem

Existing self-supervised monocular depth estimation methods fail to generalize to underwater environments due to complex light attenuation and uniform scene content, while obtaining ground-truth depth underwater is highly impractical.

Approach

The authors propose MP-UMono, which combines ego-motion constraints with a physically guided image restoration and further degradation process driven by Inherent Optical Properties to form a self-supervised learning closed loop.

Key results

Reduces RMSE by approximately 9.1% compared to state-of-the-art methods
Improves threshold accuracy by approximately 3.5%
Establishes a self-supervised closed loop using IOP-driven restoration and degradation
Demonstrates strong cross-dataset generalization on DRUVA, USOD10K, and UIEBD

Why it matters

Enables reliable, data-efficient depth perception for underwater robots and vision systems operating in challenging, unstructured aquatic environments.

Abstract

Depth information is crucial for underwater robotic detection and navigation tasks. However, the underwater imaging environment is complex and variable. The images captured by robots are typically sequences or videos with uniform scene content, and the ground-truth of depth is difficult to obtain. This challenge hinders the generalization of existing self-supervised monocular depth estimation (SMDE) schemes for practical un- derwater detection applications. To address this issue, we propose an SMDE method for underwater images informed by the physical process of optical degradation. Specifically, we developed a further degradation process for underwater images, which can constrain the image restoration process to solve the attenuation coefficient and depth map, and then combine it with the ego- motion based framework to form a self-supervised learning closed loop. Guided by inherent optical properties, this closed-loop can learn depth cues from the underwater image formation model and the geometric relationships involved in view transformation. Experiments demonstrate that the proposed method is reduced by about 9.1% in RMSE index and improved by about 3.5% in threshold accuracy compared with the SOTA method and can adapt to various underwater robot detection scenarios.

Index terms

RGB-D Perception Computer Vision for Transportation Deep Learning for Visual Perception