← Back ICRA 2026

Have We Mastered Scale in Deep Monocular Visual SLAM? the ScaleMaster Dataset and Benchmark

Hyoseok Ju, Bokeon Suh, Giseop Kim

PDF

AI summary

Key figure (auto-extracted from paper)

State-of-the-art deep monocular SLAM systems suffer from severe scale inconsistency and geometric distortion in large-scale indoor environments, a critical flaw hidden by traditional trajectory-only benchmarks.

Monocular SLAM Scale Consistency Deep Learning Benchmark Dataset 3D Reconstruction Map Quality Metrics

Problem

Existing benchmarks for deep monocular visual SLAM are limited to room-scale or structurally simple settings, failing to address intra-session scale drift and inter-session scale ambiguity in large, complex indoor environments.

Approach

The authors introduce the ScaleMaster Dataset, a challenging benchmark featuring multi-floor structures, long trajectories, repetitive views, and low-texture regions, and evaluate state-of-the-art SLAM systems using both trajectory metrics and direct map-to-map quality assessments.

Key results

Introduction of the ScaleMaster Dataset with 25 challenging indoor sequences targeting scale inconsistency
Demonstration that leading deep SLAM systems exhibit severe scale drift and trajectory errors up to ~90m in long-range sequences
Revelation that traditional ATE metrics mask geometric failures while map-to-map metrics expose hidden scale collapses
Establishment of a new evaluation protocol highlighting vulnerabilities to intra-session drift and inter-session ambiguity

Why it matters

Provides a critical benchmark and evaluation framework for researchers developing reliable, large-scale indoor mapping systems, exposing limitations of current deep SLAM approaches.

Abstract

Recent advances in deep monocular visual Simultaneous Localization and Mapping (SLAM) have achieved impressive accuracy and dense reconstruction capabilities, yet their robustness to scale inconsistency in large-scale indoor environments remains largely unexplored. Existing benchmarks are limited to room-scale or structurally simple settings, leaving critical issues of intra-session scale drift and inter-session scale ambiguity insufficiently addressed. To fill this gap, we introduce the ScaleMaster Dataset, the first benchmark explicitly designed to evaluate scale consistency under challenging scenarios such as multi-floor structures, long trajectories, repetitive views, and low-texture regions. We systematically analyze the vulnerability of state-of-the-art deep monocular visual SLAM systems to scale inconsistency, providing both quantitative and qualitative evaluations. Crucially, our analysis extends beyond traditional trajectory metrics to include a direct map-to-map quality assessment using metrics like Chamfer distance against high- fidelity 3D ground truth. Our results reveal that while recent deep monocular visual SLAM systems demonstrate strong per- formance on existing benchmarks, they suffer from severe scale- related failures in realistic, large-scale indoor environments. By releasing the ScaleMaster dataset and baseline results, we aim to establish a foundation for future research toward developing scale-consistent and reliable visual SLAM systems.

Index terms

Data Sets for SLAM SLAM Mapping