AI summary
Problem
Existing event-based visual odometry systems struggle with reliability in low-light, high dynamic range (HDR) environments and under aggressive motion due to flawed photometric assumptions and scale ambiguity in monocular deep approaches.
Approach
The authors extend a deep event VO framework to a stereo pipeline by introducing an efficient static stereo association method for sparse depth estimation, tightly coupled with bundle adjustment to enable real-time metric-scale tracking.
Key results
- Achieves state-of-the-art trajectory accuracy across five public stereo event datasets
- Enables real-time online processing of VGA-resolution event data
- Maintains stable pose estimation in large-scale nighttime HDR and aggressive motion scenarios
- Introduces a computationally lightweight static stereo association strategy for metric depth recovery
Why it matters
Provides a robust, real-time localization solution for autonomous robots navigating challenging low-light and high-dynamic-range environments where conventional and event-based systems typically fail.
Abstract
Event-based cameras are bio-inspired sensors with pixels that independently and asynchronously respond to bright- ness changes at microsecond resolution, offering the potential to handle state estimation tasks involving motion blur and high dynamic range (HDR) illumination conditions. However, the versatility of event-based visual odometry (VO) relying on handcrafted data association (either direct or indirect methods) is still unreliable, especially in field robot applications under low-light HDR conditions, where the dynamic range can be enormous and the signal-to-noise ratio is spatially-and-temporally varying. Leveraging deep neural networks offers new possibilities for overcoming these challenges. In this paper, we propose a learning-based stereo event visual odometry. Building upon Deep Event Visual Odometry (DEVO), our system (called Stereo- DEVO) introduces a novel and efficient static-stereo association strategy for sparse depth estimation with almost no additional computational burden. By integrating it into a tightly coupled bundle adjustment (BA) optimization scheme, and benefiting from the recurrent network’s ability to perform accurate optical flow estimation through voxel-based event representations to establish reliable patch associations, our system achieves high- precision pose estimation in metric scale. In contrast to the offline performance of DEVO, our system can process event data of Video Graphics Array (VGA) resolution in real time. Extensive evaluations on multiple public real-world datasets and self-collected data justify our system’s versatility, demonstrating superior performance compared to state-of-the-art event-based Manuscript received: May 19, 2025; Accepted: August 29, 2025. This paper was recommended for publication by Editor Javier Civera upon evaluation of the Associate Editor and Reviewers’ comments. This work was supported by the National Key Research and Development Project of China under Grant 2023YFB4706600. All authors are with the Neuromorphic Automation and Intelligence Lab (NAIL) at School of Artificial Intelligence and Robotics, Hunan University, Changsha, China. ∗equal contribution; † corresponding author (eeyzhou@hnu.edu.cn). Digital Object Identifier (DOI): see top of this page. VO methods. More importantly, our system achieves stable pose estimation even in large-scale nighttime HDR scenarios.