Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection
Muyao Peng, Shun Zou, Pei An, You Yang, Qiong Liu
AI summary
Problem
Existing deep learning-based image-to-point cloud registration methods struggle to reject outliers when initial inlier ratios are low or when monocular depth estimation introduces scale ambiguity, leading to inaccurate pose estimation.
Approach
The method converts images to point clouds using monocular depth, applies a scale-invariant angle-based spatial consistency metric to identify geometrically consistent matches, and refines features through a global-to-local hierarchical attention mechanism to filter outliers.
Key results
- Achieves state-of-the-art Inlier Ratio and Registration Recall on three benchmarks
- Introduces a scale-invariant angular consistency metric to neutralize depth scale ambiguity
- Demonstrates superior robustness in low-inlier-ratio and cross-scene scenarios
- Outperforms GraphI2P and GoMatch by up to 5.9% in Inlier Ratio and 6.6% in Registration Recall
Why it matters
Provides a robust, geometry-aware solution for reliable cross-modality registration, directly benefiting robotics manipulation, SLAM, and autonomous navigation in real-world environments.
Abstract
Image-to-point-cloud registration (I2P) is a fun- damental task in robotic applications such as manipulation, grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to establish correspondences, and have achieved promising results. However, when the inlier ratio of the initial matching pairs is low, conventional Perspective-n- Points (PnP) methods may struggle to achieve accurate results. To address this limitation, we propose Angle-I2P, an outlier rejection network that leverages angle-consistent geometric constraints and hierarchical attention. First, we design a scale-invariant, cross- modality geometric constraint based on angular consistency. This explicit geometric constraint guides the model in distinguishing inliers from outliers. Furthermore, we propose a global-to- local hierarchical attention mechanism that effectively filters out geometrically inconsistent matches under rigid transformation, thereby improving the Inlier Ratio (IR) and Registration Recall (RR). Experimental results demonstrate that our method achieves state-of-the-art performance on the 7Scenes, RGBD Scenes V2, and a self-collected dataset, with consistent improvements across all benchmarks.