Domain Adaptation in Visual Reinforcement Learning Via Self-Expert Imitation with Purifying Latent Feature
Lin Chen, Jianan Huang, zhen zhou, Yaonan Wang, Yang Mo, Zhiqiang Miao, Kai Zeng, Mingtao Feng, Danwei Wang
Abstract
Generalizing visual reinforcement learning is fun- damental to robot visual navigation, involving the acquisi- tion of a policy from interactions with source environments to facilitate adaptation to analogous, yet unfamiliar target environments. Recent advancements capitalize on data aug- mentation techniques, self-supervised learning methods, and the generative adversarial network framework to train policy neural networks with enhanced generalizability. However, cur- rent methods, upon extracting domain-general latent features, further utilize these features to train the reinforcement learning policy, resulting in a decline in the performance of the learned policy guiding the agent to accomplish tasks. To tackle these challenges, a framework of self-expert imitation with purifying latent features was devised, empowering the policy to achieve robust and stable zero-shot generalization performance in visually similar domains previously unseen, without diminishing the performance of guiding the agent to accomplish tasks. The extraction method of domain-general latent features is proposed to enhance their quality based on the variational autoencoder. Extensive experiments have shown that our policy, compared with state-of-the-art counterparts, does not diminish the performance of the policy guiding the agent to accomplish tasks after generalization.