Transferring Policy of Offline Reinforcement Learning from Hybrid Dataset to Real World Via Progressive Neural Network
Pengyu Zhao, Zheng Fang, Tongxu Ai, Eric Nichols, Randy Gomez, Bo He, Guangliang Li
AI summary
Problem
Offline reinforcement learning struggles with limited data diversity and the sim-to-real distributional gap, which often causes policy extrapolation errors and catastrophic failures when deployed on physical robots.
Approach
The method trains an agent on a hybrid dataset of high-diversity simulation data and high-quality real-world demonstrations, then transfers the offline policy to the real world using a Progressive Neural Network to preserve learned knowledge while adapting to new dynamics.
Key results
- Hybrid dataset accelerates offline policy learning
- PNN transfer mitigates sim-to-real distributional shift
- Faster online fine-tuning and higher real-world task performance
- Early-stage simulation data yields greater training benefits
Why it matters
Enables safe, sample-efficient deployment of reinforcement learning policies on physical robots by bridging the simulation-reality gap without extensive real-world trial-and-error.
Abstract
Offline reinforcement learning (Offline RL) provides a compelling solution for applying RL in high-risk or resource- constrained real-world domains such as healthcare, autonomous driving, and robotic manipulation, where online exploration can be unsafe or impractical. However, Offline RL faces critical challenges arising from limited data coverage and potential distributional mismatch between the pre-training dataset and real-world environment. In this paper, we propose to allow an agent to learn from a hybrid dataset: high-quality real-world data and high-diversity simulation data, and assume that the dynamics of the simulation and the real world do not match, but the state space is the same. To address the policy extrapolation error and potentially catastrophic failures because of out-of-distribution actions and sim-to-real gap, we use progressive neural networks (PNNs) to transfer the offline policy to the real world. Results in two robotic manipulation tasks with a six-degree-of-freedom Ned robotic arm show that, the hybrid dataset facilitates faster offline learning and better adaptation to real-world tasks during online learning. In addition, further analysis shows that transferring the offline policy via PNN can not only effectively retain the policy learned from the hybrid dataset and bridge the gap between simulation and reality data, but also allow the agent to explore in a more diverse distribution of samples during online learning.