← Back ICRA 2026

MV3D: Multi-View 3D Reconstruction of Objects Using Forward-Looking Sonar

Nael Jaber, Bilal Wehbe, Leif Christensen, Frank Kirchner

PDF

AI summary

Key figure (auto-extracted from paper)

The proposed deep learning model accurately reconstructs 3D underwater objects from a batch of 2D forward-looking sonar images, achieving a 0.06-meter average chamfer distance error in real-world tests.

Forward-looking sonar 3D reconstruction underwater robotics deep learning multi-view depth simulation-to-real

Problem

Forward-looking sonars output 2D images that lack elevation data, making 3D reconstruction difficult due to ambiguous 2D-to-3D correspondences and a scarcity of real-world training datasets.

Approach

An encoder-decoder network extracts features from a batch of 24 sonar images to predict eight multi-view depth maps, which are converted into a complete 3D point cloud, while a Cycle-GAN adapts synthetic training data to match real acoustic styles.

Key results

Predicts multi-view depth maps from a linear scan batch
Achieves accurate 3D reconstruction across basic and complex geometries in simulation
Validates in real underwater environments with 0.06m average chamfer distance error
Bridges simulation-to-real gap via Cycle-GAN style transfer

Why it matters

Provides a practical, high-accuracy 3D mapping solution for underwater vehicles operating in turbid or dark waters where optical sensors fail.

Abstract

This work proposes a method for learning features from a batch of 2D sonar images to predict a multi-view point- cloud for achieving a dense 3D-reconstruction. In comparison to vision-based sensors, acoustics are considered a reliable sensing modality in underwater environments. The output of sonars is a 2D image which is unable to represent the scanned scene in all three dimensions. Estimation of this missing information, known as the elevation angle, is the key to performing 3d-reconstruction from acoustic images. One of the approaches is to predict a depth-map from the 2D sonar image, and transforming it into a point-cloud. In this letter, this idea is further improved into learning features from a batch of 2D acoustic images and predicting multiple depthmaps of the scanned object which covers it from different viewpoints. For training the deep learning model, and due to the lack of datasets from real environments, data was generated synthetically. For reducing the simulation-to-real gap, a Cycle-GAN was trained on real images for transferring the realistic style into the syntheti- cally generated images. The conducted experiments in simulation showed that the proposed method is able to perform dense 3D reconstruction. The approach was then further tested in a real environment using an underwater vehicle, which accurately 3D- reconstructed the scanned objects achieving an average chamfer distance error of 0.06 meters when compared to a laser-scanned ground-truth.

Index terms

Marine Robotics Deep Learning Methods Deep Learning for Visual Perception