DVMM: A Dual-View Combination Descriptor for Multi-Modal LiDARs Online Place Recognition
Xuzhe Duan, Qingwu Hu, Mingyao Ai, Pengcheng Zhao, Jiayuan Li
AI summary
Problem
Existing place recognition descriptors for single-agent SLAM fail to handle the inherent differences (scanning density, range, mounting height, HFOV, VFOV) across multi-modal LiDARs in collaborative SLAM systems.
Approach
The method projects point clouds onto an adaptive grid to generate a 1D azimuthal descriptor for coarse loop candidate retrieval, then verifies candidates using a binary cross-section occupancy image encoded from a fixed height range.
Key results
- Significantly outperforms state-of-the-art descriptors on public and real-world multi-modal LiDAR datasets
- Robustly handles variations in HFOV, VFOV, mounting height, and point density across seven different LiDAR sensors
- Achieves accurate coarse-to-fine loop closure detection with simultaneous 4-DoF relative pose estimation
- Seamlessly integrates into collaborative SLAM frameworks for cross-agent localization
Why it matters
It enables multi-robot systems to reliably share localization data across heterogeneous LiDAR hardware, advancing robust collaborative mapping in diverse environments.
Abstract
Existing place recognition descriptors developed for single-agent SLAM struggle with multi-modal LiDAR differences in collaborative SLAM. To overcome this, we propose an online place recognition method for multi-modal LiDARs. This method introduces a dual-view combination descriptor, termed DVMM, by separately encoding azimuthal and vertical scene information. The place recognition process consists of two stages: loop closure detection and verification. In the detection stage, point clouds are projected onto an adaptive grid and a 1D azimuthal descriptor is generated via Gaussian-weighted column summation. The az- imuthal descriptor is utilized to retrieve loop candidates through vector matching. In the verification stage, point clouds within a fixed height range are encoded as a binary occupancy image, which serves as the cross-section descriptor. Accurate loop closures are determined by performing image matching on the cross-section descriptors. We evaluate the proposed method on both public and real-world datasets encompassing a total of seven LiDAR sensors. The results demonstrate that DVMM significantly outperforms state-of-the-art descriptors in handling multi-modal LiDAR data and is compatible with collaborative SLAM systems.