← Back IROS 2024

ASML-VDIO: Visual-Depth-Inertial Odometry using Selected Accurate and Stable Multi-Modal Landmarks in Structural Environments

Xingjian Luo, Chenglin Pang, Xuankang Wu, Zheng Fang

PDF

Abstract

In complex indoor structural scenes such as shop- ping centers and malls, camera pose estimation using pure point features is easy to fail due to the difficulty in extracting sufficient and stable point features from weak textures or dynamic envi- ronments. Recent works have attempted to address these chal- lenges by introducing line features. However, the addition of line features increases the number of parameters and landmarks for BA (Bundle Adjustment), leading to efficiency reduction. This is a common issue in multi-modal SLAM (Simultaneous Localization And Mapping). To address this issue, this paper proposes a novel visual-depth-inertial odometry (ASML-VDIO) framework by combining RGB-D and IMU sensors. To improve the efficiency of BA, the proposed landmark classification method classifies 3D landmarks into accurate landmarks and other landmarks based on spatial consistency verification and depth range limitation. Then, accurate landmarks are fixed, and only other landmarks are optimized in the optimization of BA. Furthermore, to remove line features extracted from dynamic objects (pedestrian, shopping-car, etc), we propose a dynamic line removal method that combines geometric constraints and motion constraints of line features. Finally, the method is evaluated on public and author-collected datasets, showing competitive accuracy and robustness in complex indoor structural scenes while 71% speedup on optimization thread with same constraints.

Index terms

SLAM Visual-Inertial SLAM