DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios
Xiangting Meng, Jiaqi Yang, Mingshu Chen, Chenxin Yan, Yujiao Shi, Wenchao Ding, Laurent Kneip
AI summary
Problem
Existing real-world datasets predominantly assume static cameras or stationary objects, leaving a critical gap for scenarios with simultaneous camera ego-motion and object dynamics due to the inherent difficulty of accurate pose annotation.
Approach
The authors developed a novel data acquisition and annotation pipeline that fuses absolute pose estimation, relative point tracking, and pose graph optimization to automatically generate high-quality 6-DoF pose annotations for dynamic objects captured by moving RGB-D cameras.
Key results
- A novel automated annotation pipeline fusing absolute pose estimation, relative tracking, and graph optimization
- DynOPETs dataset comprising 175 RGB-D sequences of dynamic objects with synchronized 6-DoF camera and object poses
- Comprehensive benchmark of 19 state-of-the-art instance-level, unseen, and category-level pose estimation methods
- Open-source release of the dataset, code, and annotations to accelerate research in embodied AI and AR/MR
Why it matters
Provides the first large-scale real-world benchmark for dynamic pose estimation with moving cameras, enabling researchers and developers in robotics, AR/MR, and embodied AI to train and evaluate robust models.
Abstract
In the realm of object pose estimation, scenarios involving both dynamic objects and moving cameras are preva- lent. However, the scarcity of corresponding real-world datasets significantly hinders the development and evaluation of robust pose estimation models. This is largely attributed to the inherent challenges in accurately annotating object poses in dynamic scenes captured by moving cameras. To bridge this gap, this paper presents a novel dataset DynOPETs and a dedicated data acquisition and annotation pipeline tailored for object pose estimation and tracking in such unconstrained environments. Our efficient annotation method innovatively integrates pose es- timation and pose tracking techniques to generate pseudo-labels, which are subsequently refined through pose graph optimiza- tion. The resulting dataset offers accurate pose annotations for dynamic objects observed from moving cameras. To validate the effectiveness and value of our dataset, we perform comprehensive evaluations using 19 state-of-the-art methods, demonstrating its potential to accelerate research in this challenging domain. The dataset will be made publicly available to facilitate further exploration and advancement in the field.