Mapping Pamir: Multi-Session Visual/Inertial SLAM and 3D Reconstruction of an Underwater Shipwreck
Michalis Chatzispyrou, Luke Horgan, Hyunkil Hwang, Harish Sathishchandra, Chinmay Burgul, Monika Roznere, Alberto Quattrini Li, Philippos Mordohai, Ioannis Rekleitis
AI summary
Problem
Accurate underwater 3D mapping typically requires expensive autonomous vehicles or complex stereo rigs, while multi-session mapping struggles with scale ambiguity, yaw drift, and the computational cost of processing high-frame-rate video.
Approach
The pipeline fuses off-the-shelf action camera video and dive computer depth data with the SVIn2 visual-inertial SLAM framework to extract keyframes and poses, which are then globally optimized and densely reconstructed using COLMAP, with calibration targets aligning separate dive sessions.
Key results
- Absolute Z-axis depth correction via dive computer synchronization
- Novel keyframe selection method eliminating SfM sub-model discontinuities
- Metric scale injection into monocular bundle adjustment using VI-SLAM poses
- Successful multi-session 3D reconstruction of the Pamir shipwreck from three dives
Why it matters
Democratizes accurate underwater 3D mapping for archaeologists and inspectors by replacing costly AUVs and stereo rigs with affordable consumer gear and open-source software.
Abstract
This paper presents a framework for multi-session mapping of underwater environments utilizing an affordable action camera. The Visual-Inertial data are augmented by water depth recordings from a dive computer. SVIn2, an open- source VI-SLAM framework is utilized to generate a trajectory and a sparse reconstruction for each session. Utilizing the keyframes extracted from SVIn2, and the estimated camera poses, a Structure-from-Motion (SfM) framework – COLMAP – is employed for global optimization and produce a dense reconstruction of the target environment. The presence of calibration targets at fixed locations, when available, is used to estimate the coordinate transformation between different data collection sessions, thus transforming the different sessions into the same coordinate frame. The proposed pipeline is employed for the mapping of a shipwreck off the coast of Barbados. For the first time, both the exterior and the accessible interior parts of the wreck were mapped in two sessions, while a third session employed two cameras with different fields of view.