← Back ICRA 2024

K-VIL: Keypoints-Based Visual Imitation Learning

Jianfeng Gao, Zhi Tao, Noémie Jaquier, Tamim Asfour

PDF

Abstract

Visual imitation learning provides efficient and intu- itive solutions for robotic systems to acquire novel manipulation skills. However, simultaneously learning geometric task con- straints and control policies from visual inputs alone remains a challenging problem. In this paper, we propose the keypoint-based visual imitation learning (K-VIL) approach that automatically extracts sparse, object-centric, and embodiment-independent task representations from a small number of human demonstration videos. The task representation is composed of keypoint-based geometric constraints on principal manifolds, their associated local frames, and the movement primitives that are then needed for the task execution. Our approach is capable of extracting such task representations from a single demonstration video, and of incrementally updating them when new demonstrations are available. To reproduce manipulation skills using the learned set of prioritized geometric constraints in novel scenes, we intro- duce a novel keypoint-based admittance controller. We evaluate our approach in several real-world applications, showcasing its ability to deal with cluttered scenes, viewpoint mismatch, new instances of categorical objects, and large object pose and shape variations. Our evaluation demonstrates the efficiency and robustness of our approach in both one-shot and few-shot imitation learning settings. Videos and source code are available at https://sites.google.com/view/k-vil.

Index terms

Learning from Demonstration Visual Learning Manipulation Planning Learning of Geometric Constraints