Research Analyzer
← Back ICRA 2026

VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes

Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, Wenchao Ding

PDF

AI summary

Key figure (auto-extracted from paper)
VINGS-Mono enables real-time, kilometer-scale outdoor 3D mapping using only a smartphone camera and IMU, outperforming existing Gaussian Splatting SLAM methods in both localization and rendering quality.
Gaussian Splatting Monocular SLAM Visual-Inertial Odometry Large-Scale Mapping Loop Closure Real-Time Reconstruction

Problem

Existing Gaussian Splatting SLAM systems are limited to small indoor scenes or rely on expensive LiDAR/depth sensors, while monocular approaches suffer from severe scale drift, high computational demands, and poor handling of dynamic objects in large-scale outdoor environments.

Approach

The framework fuses monocular RGB and IMU data through a dense visual-inertial front end to incrementally build a scalable 2D Gaussian map, leveraging novel view synthesis for efficient loop closure and a dynamic eraser to filter moving objects.

Key results

  • First monocular Gaussian SLAM to operate in kilometer-scale outdoor environments
  • Efficiently manages up to 50 million Gaussian ellipsoids with real-time smartphone deployment
  • Achieves localization accuracy on par with Visual-Inertial Odometry while surpassing GS/NeRF SLAM baselines
  • Delivers superior mapping and novel view synthesis quality through efficient loop correction and dynamic object removal

Why it matters

Enables affordable, high-fidelity 3D scene reconstruction and navigation for consumer devices and autonomous systems in large-scale urban environments without relying on costly LiDAR or depth sensors.

Abstract

VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework designed for large scenes. The framework comprises four main components: VIO Front End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End, RGB frames are processed through dense bundle adjustment and uncertainty estimation to extract scene Manuscript received: January, 10, 2025; Revised: April, 18, 2025; Accepted: August, 21, 2025. This paper was recommended for publication by Editor Javier Civera upon evaluation of the Reviewers’ comments. This work is sponsored by Shanghai Municipal Science and Technology Major Projectunder Grant(2021SHZDZX0103), National Natural Science Foundation of China (NSFC) under Grant 62403142, and the Science and Technology Commis- sion of Shanghai Municipality (No. 24511103100). (*Corresponding Author: Wenchao Ding) 1Ke Wu, Muer Tie, Ziqing Ai, Zhongxue Gan and Wenchao Ding are with the College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China. (e-mail: dingwenchao@fudan.edu.cn) 2Zicheng Zhang is with the College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China. geometry and poses. Based on this output, the mapping module incrementally constructs and maintains a 2D Gaussian map. Key components of the 2D Gaussian Map include a Sample- based Rasterizer, Score Manager, and Pose Refinement, which collectively improve mapping speed and localization accuracy. This enables the SLAM system to handle large-scale urban environments with up to 50 million Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS) capabilities of Gaussian Splatting for loop closure detection and correction of the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable presence of dynamic objects in real-world outdoor scenes. Extensive evaluations in indoor and outdoor environments demonstrate that our approach achieves localization performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF SLAM methods. It also significantly outperforms all existing methods in terms of mapping and rendering quality. Furthermore, we developed a mobile app and verified that our IEEE Transactions on Robotics (T-RO) paper, presented at ICRA 2026, Vienna, Austria. Cite as T-RO paper. ©2026 IEEE

Index terms

SLAM Sensor Fusion Mapping Gaussian Splatting

Related papers