OKVIS2-X: Open Keyframe-Based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS
Simon Boche, Jaehyung Jung, Sebastián Barbas Laina, Stefan Leutenegger
AI summary
Problem
Most existing SLAM systems rely on sparse map representations that lack the geometric detail needed for safe navigation, struggle to scale to large environments, or fail to tightly fuse diverse sensor modalities like LiDAR, neural depth, and GNSS in a unified real-time framework.
Approach
The authors introduce a keyframe-based factor graph optimizer that tightly couples a visual-inertial state estimator with dense volumetric occupancy submaps, enabling seamless multi-sensor fusion and online extrinsic calibration.
Key results
- State-of-the-art trajectory accuracy on EuRoC and Hilti22 benchmarks
- Scalable dense volumetric mapping up to 9-kilometer environments
- Unified fusion of visual, inertial, LiDAR/depth, and GNSS sensors
- Online calibration of camera-IMU extrinsics for improved precision
Why it matters
Enables mobile robots to generate globally consistent, dense maps directly usable for safe path planning while maintaining unmatched localization accuracy across diverse, large-scale scenarios.
Abstract
To empower mobile robots with usable maps as well as highest state estimation accuracy and robustness, we present OKVIS2-X: a state-of-the-art multi-sensor Simultaneous Localization and Mapping (SLAM) system building dense volu- metric occupancy maps, while scalable to large environments and operating in realtime. Our unified SLAM framework seamlessly integrates different sensor modalities: visual, inertial, measured or learned depth, LiDAR and Global Navigation Satellite System (GNSS) measurements. Unlike most state-of-the-art SLAM sys- tems, we advocate using dense volumetric map representations when leveraging depth or range-sensing capabilities. We employ an efficient submapping strategy that allows our system to scale to large environments, showcased in sequences of up to 9 kilometers. OKVIS2-X enhances its accuracy and robustness by tightly-coupling the estimator and submaps through map alignment factors. Our system provides globally consistent maps, directly usable for autonomous navigation. To further improve the accuracy of OKVIS2-X, we also incorporate the option of performing online calibration of camera extrinsics. Our system achieves the highest trajectory accuracy in EuRoC against state- of-the-art alternatives, outperforms all competitors in the Hilti22 VI-only benchmark, while also proving competitive in the LiDAR version, and showcases state of the art accuracy in the diverse and large-scale sequences from the VBR dataset. Code available at: https://github.com/ethz-mrl/OKVIS2-X.