← Back ICRA 2026

Multi-Modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments

Laura Alejandra Encinar Gonzalez, John Folkesson, Rudolph Triebel, Riccardo Giubilato

PDF

AI summary

Key figure (auto-extracted from paper)

MPRF unifies transformer-based visual retrieval and LiDAR geometric verification to deliver robust, metric loop closure detection in low-texture, GNSS-denied environments.

Loop closure detection Multimodal SLAM Foundation models DINOv2 LiDAR place recognition Unstructured environments

Problem

Visual place recognition fails in unstructured, feature-sparse terrains due to weak textures and aliasing, while existing multi-modal pipelines typically output only similarity scores without the explicit 6-DoF pose constraints required for direct SLAM integration.

Approach

The pipeline uses a two-stage DINOv2-based visual retrieval strategy for efficient candidate screening, followed by SONATA-based LiDAR descriptors to compute explicit 6-DoF relative poses through RANSAC geometric verification.

Key results

Achieves 75.7% Precision@1 on S3LI Etna and 78.3% on Vulcano sequences
Maintains end-to-end retrieval runtime under 500 ms per query
Delivers reliable 6-DoF pose estimates with over 69% of yaw predictions within 10° of ground truth
Outperforms uni-modal and retrieval-only baselines in accuracy and efficiency trade-offs

Why it matters

Provides a reliable, interpretable loop closure solution for autonomous planetary rovers and GNSS-denied SLAM systems navigating severely unstructured terrains.

Abstract

Robust loop closure detection is a critical com- ponent of Simultaneous Localization and Mapping (SLAM) algorithms in GNSS-denied environments, such as in the con- text of planetary exploration. In these settings, visual place recognition often fails due to aliasing and weak textures, while LiDAR-based methods suffer from sparsity and ambiguity. This paper presents MPRF, a multimodal pipeline that leverages transformer-based foundation models for both vision and Li- DAR modalities to achieve robust loop closure in severely un- structured environments. Unlike prior work limited to retrieval, MPRF integrates a two-stage visual retrieval strategy with explicit 6-DoF pose estimation, combining DINOv2 features with SALAD aggregation for efficient candidate screening and SONATA-based LiDAR descriptors for geometric verification. Experiments on the S3LI dataset and S3LI Vulcano dataset show that MPRF outperforms state-of-the-art retrieval methods in precision while enhancing pose estimation robustness in low- texture regions. By providing interpretable correspondences compatible with SLAM back-ends, MPRF achieves a favorable trade-off between accuracy, efficiency, and reliability, demon- strating the potential of foundation models to unify place recognition and pose estimation. Code and models will be released at github.com/DLR-RM/MPRF.

Index terms

Space Robotics and Automation Localization Field Robots