← Back ICRA 2024

Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Cihan Acar, Kuluhan Binici, Alp Tekırdag, Yan Wu

PDF

Abstract

The use of multi-camera views simultaneously has been shown to improve the generalization capabilities and per- formance of visual policies. However, using multiple cameras in real-world scenarios can be challenging. In this study, we present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our proposed method involves utilizing a tech- nique known as knowledge distillation, in which a “teacher” policy, pre-trained with multiple camera viewpoints, guides a “student” policy in learning from a single camera viewpoint. To enhance the student policy’s robustness against camera location perturba- tions, it is trained using data augmentation and extreme viewpoint changes. As a result, the student policy learns robust visual fea- tures that allow it to locate the object of interest accurately and consistently, regardless of the camera viewpoint. The efficacy and efficiencyoftheproposedmethodwereevaluatedinbothsimulation and real-world environments. The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single- view policy alone. Furthermore, the student policy demonstrates zero-shot transfer capability, where it can successfully grasp and lift objects in real-world scenarios for unseen visual configurations.

Index terms

Deep Learning in Grasping and Manipulation Learning from Experience Transfer Learning