← Back ICRA 2026

Deep Visual Odometry for Stereo Event Cameras

Sheng Zhong, Junkai Niu, Yi Zhou

PDF

AI summary

Key figure (auto-extracted from paper)

Stereo-DEVO enables real-time, metric-scale visual odometry with high precision and robustness in challenging nighttime HDR and aggressive motion scenarios, outperforming existing event-based methods.

Stereo Event Cameras Visual Odometry Deep Learning Bundle Adjustment Real-time Localization HDR Navigation

Problem

Existing event-based visual odometry systems struggle with reliability in low-light, high dynamic range (HDR) environments and under aggressive motion due to flawed photometric assumptions and scale ambiguity in monocular deep approaches.

Approach

The authors extend a deep event VO framework to a stereo pipeline by introducing an efficient static stereo association method for sparse depth estimation, tightly coupled with bundle adjustment to enable real-time metric-scale tracking.

Key results

Achieves state-of-the-art trajectory accuracy across five public stereo event datasets
Enables real-time online processing of VGA-resolution event data
Maintains stable pose estimation in large-scale nighttime HDR and aggressive motion scenarios
Introduces a computationally lightweight static stereo association strategy for metric depth recovery

Why it matters

Provides a robust, real-time localization solution for autonomous robots navigating challenging low-light and high-dynamic-range environments where conventional and event-based systems typically fail.

Abstract

Event-based cameras are bio-inspired sensors with pixels that independently and asynchronously respond to bright- ness changes at microsecond resolution, offering the potential to handle state estimation tasks involving motion blur and high dynamic range (HDR) illumination conditions. However, the versatility of event-based visual odometry (VO) relying on handcrafted data association (either direct or indirect methods) is still unreliable, especially in field robot applications under low-light HDR conditions, where the dynamic range can be enormous and the signal-to-noise ratio is spatially-and-temporally varying. Leveraging deep neural networks offers new possibilities for overcoming these challenges. In this paper, we propose a learning-based stereo event visual odometry. Building upon Deep Event Visual Odometry (DEVO), our system (called Stereo- DEVO) introduces a novel and efficient static-stereo association strategy for sparse depth estimation with almost no additional computational burden. By integrating it into a tightly coupled bundle adjustment (BA) optimization scheme, and benefiting from the recurrent network’s ability to perform accurate optical flow estimation through voxel-based event representations to establish reliable patch associations, our system achieves high- precision pose estimation in metric scale. In contrast to the offline performance of DEVO, our system can process event data of Video Graphics Array (VGA) resolution in real time. Extensive evaluations on multiple public real-world datasets and self-collected data justify our system’s versatility, demonstrating superior performance compared to state-of-the-art event-based Manuscript received: May 19, 2025; Accepted: August 29, 2025. This paper was recommended for publication by Editor Javier Civera upon evaluation of the Associate Editor and Reviewers’ comments. This work was supported by the National Key Research and Development Project of China under Grant 2023YFB4706600. All authors are with the Neuromorphic Automation and Intelligence Lab (NAIL) at School of Artificial Intelligence and Robotics, Hunan University, Changsha, China. ∗equal contribution; † corresponding author (eeyzhou@hnu.edu.cn). Digital Object Identifier (DOI): see top of this page. VO methods. More importantly, our system achieves stable pose estimation even in large-scale nighttime HDR scenarios.

Index terms

SLAM Localization Deep Learning for Visual Perception