Leveraging Geometric Priors for Unaligned Scene Change Detection
Ziling Liu, Ziwei Chen, Mingqi Gao, Jinyu Yang, Feng Zheng
AI summary
Problem
Existing unaligned scene change detection methods rely on 2D visual cues that fail under large viewpoint changes, struggling to establish robust correspondences, identify visual overlaps, and handle occlusions without generalizable multi-view knowledge.
Approach
The authors leverage a Geometric Foundation Model to extract 3D depth and camera poses, establishing explicit geometric correspondences and occlusion masks. These priors guide a training-free visual foundation model (SAM) to predict change masks without requiring task-specific training data.
Key results
- First application of Geometric Foundation Model priors to unaligned scene change detection
- Training-free framework that eliminates reliance on large-scale annotated datasets
- Superior and robust F1-scores across PSCD, ChangeSim, and PASLCD datasets
- Explicit occlusion detection and geometric correspondence outperform 2D flow-based baselines
Why it matters
Provides a reliable, zero-shot solution for real-world robotic and autonomous driving applications where camera viewpoints and lighting frequently change.
Abstract
Unaligned Scene Change Detection aims to detect scene changes between image pairs captured at different times without assuming viewpoint alignment. To handle viewpoint variations, current methods rely solely on 2D visual cues to establish cross-image correspondence to assist change detection. However, large viewpoint changes can alter visual observations, causing appearance-based matching to drift or fail. Addition- ally, supervision limited to 2D change masks from small-scale SCD datasets restricts the learning of generalizable multi-view knowledge, making it difficult to reliably identify visual overlaps and handle occlusions. This lack of explicit geometric reasoning represents a critical yet overlooked limitation. In this work, we introduce geometric priors for the first time to address the core challenges of unaligned SCD, for reliable identifica- tion of visual overlaps, robust correspondence establishment, and explicit occlusion detection. Building on these priors, we propose a training-free framework that integrates them with the powerful representations of a visual foundation model to enable reliable change detection under viewpoint misalignment. Through extensive evaluation on the PSCD, ChangeSim, and PASLCD datasets, we demonstrate that our approach achieves superior and robust performance. Our code will be released at https://github.com/ZilingLiu/GeoSCD.