← Back ICRA 2026

Event-based Motion & Appearance Fusion for 6D Object Pose Tracking

Zhichao Li, Chiara Bartolozzi, Lorenzo Natale, Arren Glover

PDF

AI summary

Key figure (auto-extracted from paper)

A learning-free event-camera pipeline combining optical flow propagation and template correction matches or beats deep RGB-D trackers, particularly for fast-moving objects.

Event cameras 6D pose tracking optical flow template matching motion blur robotics

Problem

RGB/RGB-D cameras suffer from motion blur and low update rates in dynamic scenes, while existing event-based pose trackers lack robust, high-frequency solutions that avoid heavy computation or depth dependencies.

Approach

The method estimates 6D object velocity from event-based optical flow to propagate the pose, then corrects accumulated drift by matching perturbed appearance templates against a velocity-independent event representation, all without depth sensors or neural networks.

Key results

Fuses event optical flow and template matching in a learning-free pipeline
Removes dependency on external depth measurements for velocity estimation
Matches or outperforms RGB-D deep learning baselines on fast-moving objects
Validated on both synthetic and real-world event camera datasets

Why it matters

Enables high-frequency, blur-free 6D pose tracking for robotics in highly dynamic environments where conventional cameras and heavy neural networks struggle.

Abstract

Object pose tracking is a fundamental and es- sential task for robotics to perform tasks in the home and industrial settings. The most commonly used sensors to do so are RGB-D cameras, which can hit limitations in highly dynamic environments due to motion blur and frame-rate constraints. Event cameras have remarkable features such as high temporal resolution and low latency, which make them a potentially ideal vision sensors for object pose tracking at high speed. Even so, there are still only few works on 6D pose tracking with event cameras. In this work, we take advantage of the high temporal resolution and propose a method that uses both a propagation step fused with a pose correction strategy. Specifically, we use 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction. Our learning-free method has comparable performance to the state-of-the-art algorithms, and in some cases out performs them for fast-moving objects. The results indicate the potential for using event cameras in highly-dynamic scenarios where the use of deep network approaches are limited by low update rates.

Index terms

Visual Tracking