← Back ICRA 2026

Transferring Policy of Offline Reinforcement Learning from Hybrid Dataset to Real World Via Progressive Neural Network

Pengyu Zhao, Zheng Fang, Tongxu Ai, Eric Nichols, Randy Gomez, Bo He, Guangliang Li

PDF

AI summary

Key figure (auto-extracted from paper)

Combining diverse simulation data with real-world demonstrations via Progressive Neural Networks enables faster, safer, and more robust offline-to-real-world policy transfer for robotics.

Offline Reinforcement Learning Progressive Neural Networks Sim-to-Real Transfer Hybrid Datasets Robotic Manipulation Policy Transfer

Problem

Offline reinforcement learning struggles with limited data diversity and the sim-to-real distributional gap, which often causes policy extrapolation errors and catastrophic failures when deployed on physical robots.

Approach

The method trains an agent on a hybrid dataset of high-diversity simulation data and high-quality real-world demonstrations, then transfers the offline policy to the real world using a Progressive Neural Network to preserve learned knowledge while adapting to new dynamics.

Key results

Hybrid dataset accelerates offline policy learning
PNN transfer mitigates sim-to-real distributional shift
Faster online fine-tuning and higher real-world task performance
Early-stage simulation data yields greater training benefits

Why it matters

Enables safe, sample-efficient deployment of reinforcement learning policies on physical robots by bridging the simulation-reality gap without extensive real-world trial-and-error.

Abstract

Offline reinforcement learning (Offline RL) provides a compelling solution for applying RL in high-risk or resource- constrained real-world domains such as healthcare, autonomous driving, and robotic manipulation, where online exploration can be unsafe or impractical. However, Offline RL faces critical challenges arising from limited data coverage and potential distributional mismatch between the pre-training dataset and real-world environment. In this paper, we propose to allow an agent to learn from a hybrid dataset: high-quality real-world data and high-diversity simulation data, and assume that the dynamics of the simulation and the real world do not match, but the state space is the same. To address the policy extrapolation error and potentially catastrophic failures because of out-of-distribution actions and sim-to-real gap, we use progressive neural networks (PNNs) to transfer the offline policy to the real world. Results in two robotic manipulation tasks with a six-degree-of-freedom Ned robotic arm show that, the hybrid dataset facilitates faster offline learning and better adaptation to real-world tasks during online learning. In addition, further analysis shows that transferring the offline policy via PNN can not only effectively retain the policy learned from the hybrid dataset and bridge the gap between simulation and reality data, but also allow the agent to explore in a more diverse distribution of samples during online learning.

Index terms

Reinforcement Learning Transfer Learning Motion Control