Research Analyzer
← Back IROS 2024

NFPDE: Normalizing Flow-Based Parameter Distribution Estimation for Offline Adaptive Domain Randomization

Rin Takano, Kei Takaya, Hiroyuki Oyama

PDF

Abstract

Reinforcement learning with domain randomiza- tion (DR) has been proposed as a promising approach for learning robust policies to environmental changes. However, for DR to work well in real-world environments, it is necessary to design appropriate DR distributions for model parame- ters. This paper proposes Normalizing Flow-based Parameter Distribution Estimation (NFPDE), a new estimation method for DR distributions. NFPDE models the target distribution by a flow-based generative model using normalizing flow and estimates the target distribution based on an offline dataset collected a priori in the target environment. Through numerical experiments on the OpenAI gym environment, we show that NFPDE can estimate the target distribution more accurately and efficiently than the previous estimation methods. We also show that the estimated DR distributions can improve the robustness of trained policies.

Index terms

Machine Learning for Robot Control Model Learning for Control Reinforcement Learning