MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping
Zhihao Cao, Hanyu Wu, Li Wa TANG, Zizhou Luo, Wei Zhang, Marc Pollefeys, Zihan Zhu, Martin R. Oswald
AI summary
Problem
Monocular SLAM systems suffer from scale ambiguity, limited field of view, and drift, while existing multi-camera or neural methods often rely on inertial sensors, produce sparse maps, or lack real-time efficiency. A robust, vision-only framework is needed to leverage multi-camera redundancy for high-fidelity mapping.
Approach
The system synchronizes RGB streams from a calibrated multi-camera rig and fuses them into a continuously optimized 3D Gaussian Splatting map. It jointly refines camera poses and dense depths using Multi-Camera Bundle Adjustment and enforces metric-scale alignment across views with a low-rank prior-based scale consistency module.
Key results
- First purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting
- Joint pose and depth optimization via Multi-Camera Bundle Adjustment
- Scale consistency module enforcing metric alignment across overlapping views
- Real-time tracking and photorealistic reconstruction on Waymo and synthetic datasets, surpassing monocular baselines
Why it matters
Enables scalable, vision-only high-fidelity 3D mapping and safe autonomous navigation without relying on inertial sensors or costly offline processing.
Abstract
Recent progress in dense SLAM has primarily targeted monocular setups, often at the expense of robustness and geometric coverage. We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS). Unlike prior methods relying on sparse maps or inertial data, MCGS-SLAM fuses dense RGB inputs from multiple viewpoints into a unified, continuously optimized Gaussian map. A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views using low-rank priors. The system supports RGB input and maintains real-time performance at large scale. Experiments on synthetic and real- world datasets show that MCGS-SLAM consistently yields accurate trajectories and photorealistic reconstructions, usually outperforming monocular baselines. Notably, the wide field of view from multi-camera input enables reconstruction of side-view regions that monocular setups miss, critical for safe autonomous operation. These results highlight the promise ∗Zihan Zhu is the Project Lead of this work. 1Zhihao Cao is with the Department of Mathematics, ETH Zurich, Switzerland. (e-mail: zhicao@student.ethz.ch) 2Hanyu Wu and Li Wa Tang are with the Department of Me- chanical and Process Engineering, ETH Zurich, Switzerland. (e-mail: hanywu@student.ethz.ch; litang1@student.ethz.ch) 3Zizhou Luo is with the Department of Informatics, University of Zurich, Switzerland. (e-mail: zizhou.luo@uzh.ch) 4Wei Zhang is with the Institute for Photogrammetry, University of Stuttgart, Germany (e-mail: wei.zhang@ifp.uni-stuttgart.de) 5Marc Pollefeys and Zihan Zhu are with Computer Vision and Ge- ometry Group, ETH Zurich, 8092 Zurich, Switzerland. (e-mail: zi- han.zhu@inf.ethz.ch; marc.pollefeys@inf.ethz.ch) 6Marc Pollefeys is also with Microsoft Spatial AI Lab, 8038 Zurich, Switzerland (e-mail: mapoll@microsoft.com) 7Martin R. Oswald is with Computer Vision Research Group, University of Amsterdam, Netherlands (e-mail: m.r.oswald@uva.nl) of multi-camera Gaussian Splatting SLAM for high-fidelity mapping in robotics and autonomous driving.