Research Analyzer
← Back ICRA 2026

Keypoint-Based Dynamic Object 6-DoF Pose Tracking Via Event Camera

Zhe Wang, Qijin Song, Zihao Li, Jingyu Xiao, Weibang Bai

PDF

AI summary

Key figure (auto-extracted from paper)
A keypoint-based event camera pipeline achieves robust, high-speed 6-DoF pose tracking for curved objects without requiring a predefined initial pose.
Event camera 6-DoF pose estimation keypoint tracking dynamic object tracking Extended Kalman filter robotic manipulation

Problem

Conventional cameras suffer from motion blur and low-light limitations when tracking fast-moving objects, while existing event-based pose methods struggle with curved geometries and require fixed initial poses.

Approach

The method detects object keypoints from event time surfaces using a lightweight neural network, tracks them via polarity-adaptive event density matching and an Extended Kalman Filter, and computes 6-DoF pose through 2D-3D correspondence.

Key results

  • Lightweight neural network for robust keypoint detection in sparse event data
  • Polarity-aware event density tracking algorithm with Extended Kalman Filter for drift reduction
  • Structure-aware loss function ensuring geometric consistency and precise localization
  • Superior accuracy and robustness over state-of-the-art methods in both simulated and real high-speed motion tests

Why it matters

Enables reliable robotic manipulation and assembly of fast-moving, curved objects in challenging environments where traditional vision fails.

Abstract

Accurate 6-DoF pose estimation of objects is crit- ical for robots to perform precise manipulation tasks. However, for dynamic object pose estimation, conventional camera-based approaches face several major challenges, such as motion blur, sensor noise, and low-light limitation. To address these issues, we employ event cameras, whose high dynamic range and low latency offer a promising solution. Furthermore, we propose a keypoint-based detection and tracking approach for dynamic object pose estimation. Firstly, a keypoint detection network is constructed to extract keypoints from the time surface generated by the event stream. Subsequently, the polarity and spatial coordinates of the events are leveraged, and the event density in the vicinity of each keypoint is utilized to achieve continuous keypoint tracking. Finally, a hash mapping is established between the 2D keypoints and the 3D model keypoints, and the EPnP algorithm is employed to estimate the 6-DoF pose. Experimental results demonstrate that, whether in simulated or real event environments, the proposed method outperforms the event-based state-of-the-art methods in terms of both accuracy and robustness.

Index terms

Visual Tracking Deep Learning for Visual Perception

Related papers