← Back ICRA 2026

3D Foundation Model-Based Loop Closing for Decentralized Collaborative SLAM

Pierre-Yves Lajoie, Benjamin Ramtoula, Daniele De Martini, Giovanni Beltrame

PDF

AI summary

Key figure (auto-extracted from paper)

Leveraging 3D foundation models for inter-robot loop closing enables robust, scalable decentralized collaborative SLAM with significant accuracy and efficiency gains.

Decentralized C-SLAM 3D Foundation Models Loop Closing Multi-Robot SLAM Pose Graph Optimization Monocular Vision

Problem

Decentralized collaborative SLAM struggles to identify map overlaps and establish reliable inter-robot loop closures when robots have significantly different viewpoints, especially under bandwidth constraints that prevent centralized processing or full map sharing.

Approach

The method integrates the MASt3R foundation model to estimate relative poses from monocular image pairs, paired with a novel confidence metric and specialized pose graph optimization to resolve scale ambiguities in a decentralized pipeline.

Key results

Robust inter-robot loop closure detection via MASt3R on monocular images
Novel confidence metric and outlier mitigation for spurious match filtering
Specialized pose graph optimization to resolve loop closure scale ambiguities
Improved localization accuracy and reduced computational/memory overhead vs. baselines

Why it matters

Enables efficient, large-scale multi-robot mapping in unknown environments where bandwidth is limited and robots encounter diverse viewpoints.

Abstract

Decentralized Collaborative Simultaneous Localiza- tion and Mapping (C-SLAM) techniques often struggle to identify map overlaps due to significant viewpoint variations among robots. Motivated by recent advancements in 3D foundation models, which can register images despite large viewpoint differences, we pro- pose a robust loop closing approach that leverages these models to establish inter-robot measurements. In contrast to resource- intensive methods requiring full 3D reconstruction within a cen- tralized map, our approach integrates foundation models into existing SLAM pipelines, yielding scalable and robust multi-robot mapping. Our contributions include: 1) integrating 3D foundation models to reliably estimate relative poses from monocular image pairs within decentralized C-SLAM; 2) introducing robust outlier mitigation techniques critical to the use of these relative poses and 3) developing specialized pose graph optimization formulations that efficiently resolve scale ambiguities. We evaluate our method against state-of-the-art approaches, demonstrating improvements in localization and mapping accuracy, alongside significant gains in computational and memory efficiency. These results highlight the potential of our approach for deployment in large-scale multi-robot scenarios.

Index terms

Multi-Robot SLAM SLAM Localization