Offline Meta-Reinforcement Learning with Evolving Gradient Agreement
Jiaxing Chen, Weilin Yuan, Shaofei Chen, furong liu, AO MA, Zhenzhen Hu, Peng Li
Abstract
Meta-Reinforcement Learning (Meta-RL) is a ma- chine learning paradigm aimed at learning reinforcement learn- ing policies that can quickly adapt to unseen tasks with few-shot data. Nevertheless, applying Meta-RL to real-world applications faces challenges due to the cost of data acquisition. To address this problem, offline Meta-RL has emerged as a promising solution, focusing on learning policies from pre-collected data that can effectively and rapidly adapt to unseen tasks. In this paper, we propose a new offline Meta-RL method called Meta- Actor-Critic with Evolving Gradient Agreement (MACEGA). MACEGA utilizes an evolutionary approach to estimate meta- gradients conductive to generalization across unseen tasks. During meta-training, gradient evolution is utilized to meta- update the value network and policies. Moreover, we use gradient agreement as an optimization objective for meta- learning, thereby enhancing the generalization ability of the meta-policy. We experimentally demonstrate the robustness of MACEGA in handling offline data quality. Furthermore, extensive experiments on various benchmarks provide empirical evidence that MACEGA outperforms previous state-of-the-art methods in generalizing to unseen tasks, thus demonstrating its potential for real-world applications. Offline meta-reinforcement learning, meta-reinforcement learning, evolving gradient, gradient agreement, generaliza- tion