← Back ICRA 2026

Mapping Pamir: Multi-Session Visual/Inertial SLAM and 3D Reconstruction of an Underwater Shipwreck

Michalis Chatzispyrou, Luke Horgan, Hyunkil Hwang, Harish Sathishchandra, Chinmay Burgul, Monika Roznere, Alberto Quattrini Li, Philippos Mordohai, Ioannis Rekleitis

PDF

AI summary

Key figure (auto-extracted from paper)

Affordable action cameras paired with dive computers and a hybrid SLAM-SfM pipeline enable accurate multi-session 3D mapping of underwater shipwrecks without expensive equipment.

Underwater SLAM Multi-session mapping Visual-inertial odometry Structure-from-Motion 3D reconstruction Action camera

Problem

Accurate underwater 3D mapping typically requires expensive autonomous vehicles or complex stereo rigs, while multi-session mapping struggles with scale ambiguity, yaw drift, and the computational cost of processing high-frame-rate video.

Approach

The pipeline fuses off-the-shelf action camera video and dive computer depth data with the SVIn2 visual-inertial SLAM framework to extract keyframes and poses, which are then globally optimized and densely reconstructed using COLMAP, with calibration targets aligning separate dive sessions.

Key results

Absolute Z-axis depth correction via dive computer synchronization
Novel keyframe selection method eliminating SfM sub-model discontinuities
Metric scale injection into monocular bundle adjustment using VI-SLAM poses
Successful multi-session 3D reconstruction of the Pamir shipwreck from three dives

Why it matters

Democratizes accurate underwater 3D mapping for archaeologists and inspectors by replacing costly AUVs and stereo rigs with affordable consumer gear and open-source software.

Abstract

This paper presents a framework for multi-session mapping of underwater environments utilizing an affordable action camera. The Visual-Inertial data are augmented by water depth recordings from a dive computer. SVIn2, an open- source VI-SLAM framework is utilized to generate a trajectory and a sparse reconstruction for each session. Utilizing the keyframes extracted from SVIn2, and the estimated camera poses, a Structure-from-Motion (SfM) framework – COLMAP – is employed for global optimization and produce a dense reconstruction of the target environment. The presence of calibration targets at fixed locations, when available, is used to estimate the coordinate transformation between different data collection sessions, thus transforming the different sessions into the same coordinate frame. The proposed pipeline is employed for the mapping of a shipwreck off the coast of Barbados. For the first time, both the exterior and the accessible interior parts of the wreck were mapped in two sessions, while a third session employed two cameras with different fields of view.

Index terms

Marine Robotics Mapping Visual-Inertial SLAM