Research Analyzer
← Back ICRA 2026

OKVIS2-X: Open Keyframe-Based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS

Simon Boche, Jaehyung Jung, Sebastián Barbas Laina, Stefan Leutenegger

PDF

AI summary

Key figure (auto-extracted from paper)
OKVIS2-X unifies visual, inertial, LiDAR/depth, and GNSS data into scalable dense volumetric maps, delivering state-of-the-art accuracy and robustness for large-scale autonomous navigation.
Multi-sensor SLAM Volumetric Mapping Visual-Inertial Odometry LiDAR Fusion GNSS Integration Factor Graph Optimization

Problem

Most existing SLAM systems rely on sparse map representations that lack the geometric detail needed for safe navigation, struggle to scale to large environments, or fail to tightly fuse diverse sensor modalities like LiDAR, neural depth, and GNSS in a unified real-time framework.

Approach

The authors introduce a keyframe-based factor graph optimizer that tightly couples a visual-inertial state estimator with dense volumetric occupancy submaps, enabling seamless multi-sensor fusion and online extrinsic calibration.

Key results

  • State-of-the-art trajectory accuracy on EuRoC and Hilti22 benchmarks
  • Scalable dense volumetric mapping up to 9-kilometer environments
  • Unified fusion of visual, inertial, LiDAR/depth, and GNSS sensors
  • Online calibration of camera-IMU extrinsics for improved precision

Why it matters

Enables mobile robots to generate globally consistent, dense maps directly usable for safe path planning while maintaining unmatched localization accuracy across diverse, large-scale scenarios.

Abstract

To empower mobile robots with usable maps as well as highest state estimation accuracy and robustness, we present OKVIS2-X: a state-of-the-art multi-sensor Simultaneous Localization and Mapping (SLAM) system building dense volu- metric occupancy maps, while scalable to large environments and operating in realtime. Our unified SLAM framework seamlessly integrates different sensor modalities: visual, inertial, measured or learned depth, LiDAR and Global Navigation Satellite System (GNSS) measurements. Unlike most state-of-the-art SLAM sys- tems, we advocate using dense volumetric map representations when leveraging depth or range-sensing capabilities. We employ an efficient submapping strategy that allows our system to scale to large environments, showcased in sequences of up to 9 kilometers. OKVIS2-X enhances its accuracy and robustness by tightly-coupling the estimator and submaps through map alignment factors. Our system provides globally consistent maps, directly usable for autonomous navigation. To further improve the accuracy of OKVIS2-X, we also incorporate the option of performing online calibration of camera extrinsics. Our system achieves the highest trajectory accuracy in EuRoC against state- of-the-art alternatives, outperforms all competitors in the Hilti22 VI-only benchmark, while also proving competitive in the LiDAR version, and showcases state of the art accuracy in the diverse and large-scale sequences from the VBR dataset. Code available at: https://github.com/ethz-mrl/OKVIS2-X.

Index terms

SLAM Mapping Localization Sensor Fusion

Related papers