Sat-RoMa: Cross-Scale Dense Matching for Multi-Temporal UAV-To-Orthophoto Registration
Maciej Krupka, Jan Węgrzynowski, Piotr Skrzypczynski
AI summary
Problem
Current feature matchers fail to accurately localize downward-facing drone cameras against reference satellite maps due to severe seasonal appearance changes and extreme scale discrepancies. This gap prevents reliable absolute positioning for UAVs in GPS-denied environments.
Approach
Sat-RoMa adapts the RoMa architecture by freezing a satellite-pretrained DinoV3 encoder and training end-to-end on cross-seasonal image pairs. It explicitly handles matching a small drone query to a 4× larger reference map.
Key results
- Achieves 11.2% scale error versus over 100% for baselines
- Reduces reprojection error to 42.3 pixels, a 6–7× improvement
- Lowers rotational error to 11.1° while maintaining geometric fidelity
- Demonstrates robustness to severe seasonal and structural appearance changes
Why it matters
Provides a critical drift-correction mechanism for UAVs operating in GPS-denied environments, advancing autonomous search and rescue and mapping applications.
Abstract
Reliable Global Navigation Satellite System (GNSS) signals are increasingly denied or jammed in real- world applications, such as search and rescue operations. In such scenarios, Unmanned Aerial Vehicles (UAVs) must rely on downward-facing cameras for absolute localization against reference satellite maps. While Visual Inertial Odometry (VIO) is highly accurate locally, it inevitably accumulates drift over time. Localizing a drone image against a pre-existing satellite map (e.g., Google Earth) via homography estimation is a viable solution, but it is severely challenged by seasonal variations, construction, and vegetation changes. In this paper, we propose Sat-RoMa, an end-to-end robust dense feature matcher adapted from the state-of-the-art RoMa architecture. By utilizing a frozen, pre-trained DinoV3 encoder specifically tuned for satel- lite imagery, and formulating the task as matching a small drone image to a 4× larger reference map, Sat-RoMa explicitly handles scale discrepancies and temporal appearance changes. Preliminary results demonstrate that Sat-RoMa significantly outperforms baselines like LoFTR and LightGlue, achieving an 11.2% scale error compared to over 100% for existing methods, paving the way for robust GPS-denied UAV navigation.