GeVI-SLAM: Gravity-Enhanced Stereo VI SLAM for Underwater Robots
Yuan Shen, Yuze Hong, Guangyang Zeng, Tengfei Zhang, Pui Yi Chui, Ziyang Hong, Junfeng Wu
AI summary
Problem
Underwater visual-inertial SLAM struggles with visual degeneracy from sparse or repetitive textures and insufficient IMU excitation due to high water resistance, causing initialization failures and orientation drift.
Approach
The system leverages stereo depth to fix scale and uses a precise gravity prior to decouple roll and pitch, enabling a fast 4-DOF pose solver with adaptive visual-inertial fusion that dynamically weights gravity constraints based on motion dynamics.
Key results
- Gravity-enhanced 4-DOF PnP solver with a provably consistent, bias-eliminated estimator
- Adaptive visual-inertial fusion via joint pose and covariance estimation to prevent roll/pitch drift
- CRLB-tight accuracy in simulations and robust outlier rejection under 30% feature mismatches
- Lower trajectory and pose errors than ORB-SLAM3, VINS-Fusion, and SVIN2 in real-world underwater tests
Why it matters
Provides a reliable perception foundation for underwater robots navigating feature-sparse, low-acceleration environments where conventional SLAM systems typically fail.
Abstract
Accurate visual–inertial simultaneous localization and mapping (VI SLAM) for underwater robots remains a significant challenge due to frequent visual degeneracy and insufficient inertial measurement unit (IMU) motion excitation. In this paper, we present GeVI-SLAM, a gravity-enhanced stereo VI SLAM system designed to address these issues. By leveraging the stereo camera’s direct depth estimation ability, we eliminate the need to estimate scale during IMU initial- ization, enabling stable operation even under low-acceleration dynamics. With precise gravity initialization, we decouple the pitch and roll from the pose estimation and solve a 4 degrees of freedom (DOF) Perspective-n-Point (PnP) problem for pose tracking. This allows the use of a minimal 3-point solver, which significantly reduces computational time to reject outliers within a Random Sample Consensus framework. We further propose a bias-eliminated 4-DOF PnP estimator with provable consistency, ensuring the relative pose converges to the true value as the feature number increases. To handle dynamic motion, we refine the full 6-DOF pose while jointly estimating the IMU covariance, enabling adaptive weighting of the gravity prior. Extensive experiments on simulated and real-world data demonstrate that GeVI-SLAM achieves higher accuracy and greater stability compared to state-of-the-art methods.