Ensemble-Based Event Camera Place Recognition under Varying Illumination
Therese Joseph, Tobias Fischer, Michael J Milford
AI summary
Problem
Event-based visual place recognition struggles under severe illumination changes, and prior ensemble methods only fuse across temporal resolutions, leaving reconstruction and feature diversity underexplored.
Approach
The authors aggregate similarity scores from multiple event-to-frame reconstruction methods, state-of-the-art VPR feature extractors, and varying temporal resolutions, combined with a modified sequence matching algorithm that adapts to dynamic history lengths.
Key results
- Up to 77% relative improvement in Recall@1 across day-night transitions
- Comprehensive ablation of binning strategies, reconstruction methods, and feature extractors
- Modified sequence matching with dynamic history length and z-score normalization
- Validation on two long-term driving datasets without metric subsampling
Why it matters
Provides a robust, hardware-efficient solution for long-term localization and loop closure in autonomous vehicles operating under extreme lighting conditions.
Abstract
Compared to conventional cameras, event cameras provide a high dynamic range and low latency, offering greater robustness to rapid motion and challenging lighting conditions. Although the potential of event cameras for visual place recognition (VPR) has been established, developing robust VPR frameworks under severe illumination changes remains an open research problem. Here, we introduce an ensemble-based approach to event camera place recognition that combines sequence-matched results from multiple event-to- frame reconstructions, VPR feature extractors, and temporal resolutions. Unlike previous event-based ensemble methods, which only utilise temporal resolution, our broader fusion strategy delivers significantly improved robustness under varied lighting conditions (e.g., afternoon, sunset, night), achieving up to 77% relative improvement in Recall@1 across day-night transitions. We evaluate our approach on two long-term driving datasets (with 8 km per traverse) without metric subsampling, thereby preserving natural variations in speed and stop duration that influence event density. We also conduct a comprehensive analysis of key design choices, including binning strategies, reconstruction methods, and feature extractors, to identify the most critical components for robust performance. Additionally, we propose a modification to the standard sequence matching framework that enhances performance at longer sequence lengths. To facilitate future research, we release our codebase and benchmarking framework 1.