← Back ICRA 2026

PG-Match: A Pose-Guided Generalizable Framework for Semi-Dense Feature Matching

Jiayi Pei, Peili Song, Chenyang Zhao, Lei Sun, Jingtai Liu

PDF

AI summary

Key figure (auto-extracted from paper)

PG-Match replaces scarce depth supervision with pose guidance to achieve robust, generalizable semi-dense feature matching without keypoint detectors.

Feature matching Pose supervision Detector-free Semi-dense correspondence Differentiable outlier rejection Structure from Motion

Problem

Existing detector-free feature matching methods rely on ground-truth depth data for supervision, which is scarce and limits generalization across diverse environments. Furthermore, traditional match supervision lacks global geometric consistency, reducing inlier ratios and degrading downstream performance.

Approach

PG-Match leverages ground-truth camera poses as supervision instead of depth, enabling end-to-end training via a Differentiable Outlier Rejection Module (DORM). It combines this with a confidence-guided coarse-to-fine matching strategy to efficiently refine semi-dense correspondences while maintaining global consistency.

Key results

Outperforms state-of-the-art pose accuracy on MegaDepth-1500
Demonstrates strong cross-dataset generalization on PhotoTourism
Improves accuracy and completeness in downstream SfM pipelines
Increases inlier ratios through differentiable outlier rejection

Why it matters

Enables reliable, depth-independent feature matching for real-world 3D reconstruction and visual localization where ground-truth depth is unavailable.

Abstract

Feature matching is a fundamental technique in visual perception, essential for tasks such as 3D reconstruction, SLAM, and visual localization. Existing detector-free methods often struggle to generalize due to their reliance on depth data, which is not available in many datasets. We propose PG-Match, a detector-free feature matching framework that leverages pose supervision instead of depth-based supervision, thereby im- proving generalization across diverse environments. We further introduce a Differentiable Outlier Rejection Module (DORM) to enhance global consistency and increase the inlier ratio. For efficiency, a coarse-to-fine matching strategy is employed, where specially designed confidence scores are utilized to guide the sampling process. This ensures efficient convergence and avoids local optima. Experiments on the widely used MegaDepth- 1500 dataset show that PG-Match consistently outperforms state-of-the-art approaches, highlighting the effectiveness of its pose-guided design. Additionally, experiments on the depth-free PhotoTourism dataset further evaluate generalization of PG- Match, and its performance is also assessed in a downstream Structure from Motion (SfM) task.

Index terms

Deep Learning for Visual Perception Computer Vision for Automation Visual Learning