Learning 6D Object Pose Estimation with Event Cameras Using Synthetic Data and Domain Randomization
Oussama Abdul Hay, Xiaoqian Huang, Muhammad Ahmed Humais, Abdulla Ayyad, Randa Almadhoun, Yahya Zweiri
AI summary
Problem
Current event-based pose estimation relies on optimization methods limited to simple objects, while RGB/RGB-D sensors fail under fast motion and challenging lighting.
Approach
The authors propose the Augmented Event Encoder (AEE), a lightweight network trained on domain-randomized synthetic templates converted to event-like edge features, which matches real event queries to a latent template codebook for pose retrieval.
Key results
- First learning-based 6D pose estimation method for event cameras
- 5 ms inference time enabling ~200 Hz real-time performance
- Robust accuracy across varying illumination and high-speed motion on the E-POSE dataset
- Strong ADD-S (Rotation) metric performance without real event training data
Why it matters
Enables reliable, real-time object tracking for robotics and autonomous systems in dynamic or low-light environments where conventional cameras fail.
Abstract
Estimating the 6D pose of rigid objects is a critical upstream task in many robotics applications. Most existing meth- ods rely on RGB or RGB-D sensing modalities, which suffer from limitations under challenging lighting conditions and high-speed motion. In contrast, event-based cameras offer unique advantages such as high temporal resolution and high dynamic range, making them well-suited for such scenarios. However, current event-based poseestimationmethodsaretypicallyoptimization-based,designed for relatively simple objects, and require hand-crafted parameters. In this work, we introduce the first learning-based approach for 6D object pose estimation using event cameras, employing an Aug- mented Event Encoder (AEE) trained entirely only on synthetic data and validated on the E-POSE dataset. Our model leverages an augmented autoencoder with domain randomization to map synthetic templates into a latent space, enabling accurate matching with real event query images. The method demonstrates robust performance across various scenarios, including changes in illu- mination and camera speeds, and achieves strong results on the ADD-S (Rotation) metric.