← Back ICRA 2026

GRS-SLAM3R: Real-Time Dense SLAM with Gated Recurrent State

Guole Shen, Tianchen Deng, Yanbo Wang, Yongtao Chen, Yilin Shen, Jiuming Liu, Jingchuan Wang

PDF

AI summary

Key figure (auto-extracted from paper)

GRS-SLAM3R achieves real-time, globally consistent dense SLAM in large-scale scenes by combining a gated recurrent memory mechanism with hierarchical submap alignment.

Dense SLAM Gated Recurrent Memory 3D Reconstruction Real-time Mapping Spatial Consistency DUSt3R

Problem

Existing DUSt3R-based SLAM methods process image pairs in isolation, neglecting spatial memory and global consistency, which causes drift and poor scalability in long-sequence or large-scale environments.

Approach

The framework incrementally updates a latent spatial memory using transformer-based reset and update gates, then aligns the scene hierarchically through local submap refinement and inter-submap registration.

Key results

Novel gated recurrent latent state for consistent multi-frame spatial correlation
Hierarchical multi-submap alignment to bound drift and preserve global consistency
Superior reconstruction accuracy and pose estimation on large-scale and long-sequence datasets
Real-time dense mapping without requiring camera intrinsics or depth priors

Why it matters

Provides a scalable, drift-resistant dense mapping solution for robotics and spatial computing in unconstrained, large-scale environments.

Abstract

DUSt3R-based end-to-end scene reconstruction has recently shown promising results in dense visual SLAM. However, most existing methods only use image pairs to estimate pointmaps, overlooking spatial memory and global consistency. To this end, we introduce GRS-SLAM3R, an end-to-end SLAM framework for dense scene reconstruction and pose estimation from RGB images without any prior knowledge of the scene or camera parameters. Unlike existing DUSt3R-based frameworks, which operate on all image pairs and predict per-pair point maps in local coordinate frames, our method supports sequen- tialized input and incrementally estimates metric-scale point clouds in the global coordinate. In order to improve consistent spatial correlation, we use a latent state for spatial memory and design a transformer-based gated update module to reset and update the spatial memory that continuously aggregates and tracks relevant 3D information across frames. Furthermore, we partition the scene into submaps, apply local alignment within each submap, and register all submaps into a common world frame using relative constraints, producing a globally consistent map. Experiments on various datasets show that our framework achieves superior reconstruction accuracy while maintaining real-time performance.

Index terms

SLAM Mapping Computer Vision for Transportation