NFPDE: Normalizing Flow-Based Parameter Distribution Estimation for Offline Adaptive Domain Randomization
Rin Takano, Kei Takaya, Hiroyuki Oyama
Abstract
Reinforcement learning with domain randomiza- tion (DR) has been proposed as a promising approach for learning robust policies to environmental changes. However, for DR to work well in real-world environments, it is necessary to design appropriate DR distributions for model parame- ters. This paper proposes Normalizing Flow-based Parameter Distribution Estimation (NFPDE), a new estimation method for DR distributions. NFPDE models the target distribution by a flow-based generative model using normalizing flow and estimates the target distribution based on an offline dataset collected a priori in the target environment. Through numerical experiments on the OpenAI gym environment, we show that NFPDE can estimate the target distribution more accurately and efficiently than the previous estimation methods. We also show that the estimated DR distributions can improve the robustness of trained policies.