Event-based Motion & Appearance Fusion for 6D Object Pose Tracking
Zhichao Li, Chiara Bartolozzi, Lorenzo Natale, Arren Glover
AI summary
Problem
RGB/RGB-D cameras suffer from motion blur and low update rates in dynamic scenes, while existing event-based pose trackers lack robust, high-frequency solutions that avoid heavy computation or depth dependencies.
Approach
The method estimates 6D object velocity from event-based optical flow to propagate the pose, then corrects accumulated drift by matching perturbed appearance templates against a velocity-independent event representation, all without depth sensors or neural networks.
Key results
- Fuses event optical flow and template matching in a learning-free pipeline
- Removes dependency on external depth measurements for velocity estimation
- Matches or outperforms RGB-D deep learning baselines on fast-moving objects
- Validated on both synthetic and real-world event camera datasets
Why it matters
Enables high-frequency, blur-free 6D pose tracking for robotics in highly dynamic environments where conventional cameras and heavy neural networks struggle.
Abstract
Object pose tracking is a fundamental and es- sential task for robotics to perform tasks in the home and industrial settings. The most commonly used sensors to do so are RGB-D cameras, which can hit limitations in highly dynamic environments due to motion blur and frame-rate constraints. Event cameras have remarkable features such as high temporal resolution and low latency, which make them a potentially ideal vision sensors for object pose tracking at high speed. Even so, there are still only few works on 6D pose tracking with event cameras. In this work, we take advantage of the high temporal resolution and propose a method that uses both a propagation step fused with a pose correction strategy. Specifically, we use 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction. Our learning-free method has comparable performance to the state-of-the-art algorithms, and in some cases out performs them for fast-moving objects. The results indicate the potential for using event cameras in highly-dynamic scenarios where the use of deep network approaches are limited by low update rates.