← Back ICRA 2026

BEV-Patch-PF: Particle Filtering with BEV-Aerial Feature Matching for Off-Road Geo-Localization

Dongmyeong Lee, Jesse Quattrociocchi, Christian Ellis, Rwik Rana, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas

PDF

AI summary

Key figure (auto-extracted from paper)

BEV-Patch-PF enables robust, real-time, GPS-free off-road robot localization by sequentially matching ground-level bird's-eye-view features to aerial imagery within a particle filter.

Cross-view geo-localization Particle filtering Bird's-eye view Off-road robotics GPS-free navigation Aerial feature matching

Problem

Existing cross-view geo-localization methods struggle with large viewpoint differences, environmental variability, and lack of temporal consistency, making them unreliable in unstructured, GPS-denied off-road environments.

Approach

The system uses a particle filter to continuously update pose hypotheses by matching learned bird's-eye-view features from onboard RGB-D cameras with corresponding patches from a geo-referenced aerial map, while adaptively downweighting uncertain observations.

Key results

9.7 m lower ATE on seen routes and 6.6 m lower ATE on unseen routes compared to retrieval baselines
Robust localization under dense tree canopy cover and heavy shadowing
Real-time operation at 10 Hz on an NVIDIA Tesla T4
Introduction of the UT-SARA-GQ dataset for benchmarking cross-view localization in challenging off-road conditions

Why it matters

Provides a practical, deployment-ready solution for autonomous ground robots navigating complex, GPS-denied terrains where traditional localization fails.

Abstract

Localizing ground robots against aerial imagery provides a critical capability for autonomous navigation, espe- cially in environments where GPS is unreliable or unavailable. This task is challenging due to large viewpoint di!erences and substantial environmental variability. Most prior methods lo- calize each frame independently, using either global-descriptor retrieval or spatial feature alignment, which leaves them vul- nerable to ambiguity and multi-modal pose hypotheses. While sequential reasoning can mitigate this uncertainty, adapting existing per-frame pipelines for sequential use introduces unfa- vorable trade-o!s among accuracy, memory, and computation that limit their practical deployment. We propose BEV-P!”#$- PF, a vision-only, GPS-free sequential geo-localization system that integrates particle filtering with learned bird’s-eye-view (BEV) and aerial feature maps. For each 3-DoF particle pose hypothesis, we crop the corresponding patch from an aerial feature map computed from a local aerial image centered on the predicted pose. The resulting BEV–aerial feature match defines a per-particle log-likelihood for particle-filter updates. In addi- tion, we learn a frame-level uncertainty estimate that adaptively flattens the observation likelihood for unreliable observations, preventing overconfident particle collapse in ambiguous regions. On two real-world o!-road datasets, our method achieves 9.7→lower absolute trajectory error (ATE) on seen routes and 6.6→lower ATE on unseen routes than a retrieval-based baseline, while remaining robust under partial canopy cover and shadowing. The system runs in real time at 10 Hz on an NVIDIA Tesla T4, enabling practical robot deployment. I. I!”#$%&’”($! High-quality global localization in a geo-referenced frame allows robots to leverage aerial imagery, which can be used to provide improved long-range o!-road planning and naviga- tion around hazards such as cli!s and rivers. Although visual and LiDAR-inertial odometry provide local pose estimates, they accumulate drift without global fixes, leading to errors that compromise downstream planning. Cross-view geo-localization addresses the lack of global position fixes by estimating a robot’s 3-DoF pose in a UTM frame by matching ground-level images with geo-referenced aerial imagery. However, this task is inherently di”cult due to potentially large viewpoint di!erences between the onboard and aerial sensors. This problem is especially chal- lenging in unstructured o!-road environments, where the absence of man-made landmarks and the presence of terrain irregularities and tree canopy exacerbate the visual mismatch and remove many of the cues that conventional methods rely on [1], [2]. Recent deep learning approaches typically (a) (b) (c) (d) (e) (f) Fig. 1. Visualization of BEV-P)”’*-PF inputs and outputs. Top (Inputs): (a) onboard RGB image I, (b) depth image D, and (c) a local aerial orthophoto M[ˆxL|L↑1], where the white arrow indicates the ground-truth pose. Bottom (Outputs): (d) the BEV distinctiveness map C, (e) the corresponding feature map G, and (f) the aerial feature map F. The white box in (f) illustrates one representative sampled patch; during inference, BEV-P)”’*-PF evaluates a patch for each particle hypothesis. tackle this problem frame-by-frame, falling into two main categories: retrieval-based methods [3], [4], [5], [6], [7] that learn global descriptors for ground and aerial images, and spatial feature-alignment methods [8], [9], [10], [11], [12] that infer poses by aligning features in a shared representa- tion. Per-frame localization, however, considers only a single observation at a time, making it vulnerable to ambiguity and multi-modal solutions. In o!-road settings, this can lead to catastrophic pose jumps caused by visually similar map regions or sensor occlusions. Sequential localization mitigates these issues by enforcing temporal consistency. While sequential inference can reduce pose ambiguity, it requires observation models that yield smooth, discrim- inative likelihoods over continuous pose hypotheses. Most prior cross-view methods [5], [6], [8], [9] do not provide continuous likelihoods. Retrieval-based approaches assign similarity scores over a discretized set of aerial patches, making them insensitive to fine-grained pose changes and unsuitable for continuous probabilistic filtering. In contrast, 2026 IEEE International Conference on Robotics and Automation (ICRA 2026) June 1-5, 2026. Vienna, Austria 979-8-3315-8160-2/26/$31.00 ©2026 IEEE 2449

Index terms

Localization Field Robots