AIM-SLAM: Dense Monocular SLAM Via Adaptive and Informative Multi-View Keyframe Prioritization with Foundation Model
Jinwoo Jeon, Dong-Uk Seo, Eungchang Mason Lee, Hyun Myung
AI summary
Problem
Existing foundation model-based SLAM systems rely on fixed-length or temporally consecutive keyframe windows, which often include redundant frames with limited geometric information gain, leading to structural inconsistencies and scale drift.
Approach
The framework uses a SIGMA module to adaptively prioritize keyframes based on 3D voxel overlap and information gain, then jointly optimizes their poses in Sim(3) space for consistent dense reconstruction.
Key results
- Introduces SIGMA module for adaptive multi-view keyframe prioritization
- Formulates joint multi-view Sim(3) optimization for uncalibrated inputs
- Achieves state-of-the-art pose estimation accuracy on real-world datasets
- Enables accurate, globally consistent dense 3D reconstruction with ROS integration
Why it matters
It provides a scalable, calibration-free SLAM framework that maximizes the utility of geometric foundation models for robotics and autonomous navigation applications.
Abstract
Recent advances in geometric foundation models have emerged as a promising alternative for addressing the challenge of dense reconstruction in monocular visual simulta- neous localization and mapping (SLAM). Although geometric foundation models enable SLAM to leverage variable input views, the previous methods remain confined to two-view pairs or fixed-length inputs without sufficient deliberation of geometric context for view selection. To tackle this problem, we propose AIM-SLAM, a dense monocular SLAM frame- work that exploits an adaptive and informative multi-view keyframe prioritization with dense pointmap predictions from visual geometry grounded transformer (VGGT). Specifically, we introduce the selective information- and geometric-aware multi- view adaptation (SIGMA) module, which employs voxel overlap and information gain to retrieve a candidate set of keyframes and adaptively determine its size. Furthermore, we formulate a joint multi-view Sim(3) optimization that enforces consistent alignment across selected views, substantially improving pose estimation accuracy. The effectiveness of AIM-SLAM is demon- strated on real-world datasets, where it achieves state-of-the-art pose estimation performance and accurate dense reconstruction results. Our system supports ROS integration, with code is available at https://aimslam.github.io/.