← Back IROS 2024

Raising Body Ownership in End-To-End Visuomotor Policy Learning Via Robot-Centric Pooling

Zheyu Zhuang, Ville Kyrki, Danica Kragic

PDF

Abstract

We present Robot-centric Pooling (RcP), a novel pooling method designed to enhance end-to-end visuomo- tor policies by enabling differentiation between the robots and similar entities or their surroundings. Given an image- proprioception pair, RcP guides the aggregation of image features by highlighting image regions correlating with the robot’s proprioceptive states, thereby extracting robot-centric image representations for policy learning. Leveraging con- trastive learning techniques, RcP integrates seamlessly with existing visuomotor policy learning frameworks and is trained jointly with the policy using the same dataset, requiring no extra data collection involving self-distractors. We evaluate the proposed method with reaching tasks in both simulated and real-world settings. The results demonstrate that RcP signifi- cantly enhances the policies’ robustness against various unseen distractors, including self-distractors, positioned at different locations. Additionally, the inherent robot-centric characteristic of RcP enables the learnt policy to be far more resilient to aggressive pixel shifts compared to the baselines. Code available at: https://github.com/Zheyu-Zhuang/RcP

Index terms

Perception for Grasping and Manipulation Visual Servoing