Research Analyzer
← Back ICRA 2026

Learning 6D Object Pose Estimation with Event Cameras Using Synthetic Data and Domain Randomization

Oussama Abdul Hay, Xiaoqian Huang, Muhammad Ahmed Humais, Abdulla Ayyad, Randa Almadhoun, Yahya Zweiri

PDF

AI summary

Key figure (auto-extracted from paper)
The first learning-based 6D pose estimation method for event cameras achieves robust, real-time accuracy under extreme lighting and motion by training entirely on synthetic data.
6D pose estimation event cameras domain randomization synthetic data template matching robotics perception

Problem

Current event-based pose estimation relies on optimization methods limited to simple objects, while RGB/RGB-D sensors fail under fast motion and challenging lighting.

Approach

The authors propose the Augmented Event Encoder (AEE), a lightweight network trained on domain-randomized synthetic templates converted to event-like edge features, which matches real event queries to a latent template codebook for pose retrieval.

Key results

  • First learning-based 6D pose estimation method for event cameras
  • 5 ms inference time enabling ~200 Hz real-time performance
  • Robust accuracy across varying illumination and high-speed motion on the E-POSE dataset
  • Strong ADD-S (Rotation) metric performance without real event training data

Why it matters

Enables reliable, real-time object tracking for robotics and autonomous systems in dynamic or low-light environments where conventional cameras fail.

Abstract

Estimating the 6D pose of rigid objects is a critical upstream task in many robotics applications. Most existing meth- ods rely on RGB or RGB-D sensing modalities, which suffer from limitations under challenging lighting conditions and high-speed motion. In contrast, event-based cameras offer unique advantages such as high temporal resolution and high dynamic range, making them well-suited for such scenarios. However, current event-based poseestimationmethodsaretypicallyoptimization-based,designed for relatively simple objects, and require hand-crafted parameters. In this work, we introduce the first learning-based approach for 6D object pose estimation using event cameras, employing an Aug- mented Event Encoder (AEE) trained entirely only on synthetic data and validated on the E-POSE dataset. Our model leverages an augmented autoencoder with domain randomization to map synthetic templates into a latent space, enabling accurate matching with real event query images. The method demonstrates robust performance across various scenarios, including changes in illu- mination and camera speeds, and achieves strong results on the ADD-S (Rotation) metric.

Index terms

Deep Learning for Visual Perception Perception for Grasping and Manipulation Deep Learning in Grasping and Manipulation

Related papers