← Back ICRA 2024

AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT

Fangbo Qin, Taogang Hou, Shan Lin, Kaiyuan Wang, Michael C. Yip, Shan Yu

PDF

Abstract

Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbi- trary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by train- ing-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on ap- pearance similarity, to yield instance-unaware candidate key- points. Then, the entire graph with all candidate keypoints as vertices are divided into sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph rep- resents an object instance. AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot, which not only demonstrates the cross-category flexibility and instance awareness, but also show remarkable robustness to domain shift and viewpoint change.

Index terms

Deep Learning for Visual Perception Computer Vision for Automation Visual Learning