RM-RL: Role-Model Reinforcement Learning for Precise Robot Manipulation
Xiangyu Chen, Chuhao Zhou, Yuxi Liu, Jianfei Yang
AI summary
Problem
High-precision real-world robot manipulation requires expert demonstrations that are costly to collect, while standard reinforcement learning suffers from data inefficiency and distribution shifts during online training.
Approach
The framework automatically identifies the highest-reward action from similar initial states to label online data, reformulating policy learning as supervised training and reusing samples through a hybrid online-offline replay scheme.
Key results
- 53% improvement in translation accuracy and 20% in rotation accuracy
- Faster and more stable policy convergence in real-world trials
- Successful millimeter-precision cell plate placement where baselines fail
- Eliminates need for human demonstrations while mitigating distribution shift
Why it matters
Provides a practical, demonstration-free training pathway for deploying high-precision robotic policies in delicate biological and chemical laboratory settings.
Abstract
Precise robot manipulation is critical for fine- grained applications such as chemical and biological exper- iments, where even small errors (e.g., reagent spillage) can invalidate an entire task. Existing approaches often rely on pre- collected expert demonstrations and train policies via imitation learning (IL) or offline reinforcement learning (RL). However, obtaining high-quality demonstrations for precision tasks is dif- ficult and time-consuming, while offline RL commonly suffers from distribution shifts and low data efficiency. We introduce a Role-Model Reinforcement Learning (RM-RL) framework that unifies online and offline training in real-world environments. The key idea is a role-model strategy that automatically gen- erates labels for online training data using approximately op- timal actions, eliminating the need for human demonstrations. RM-RL reformulates policy learning as supervised training, reducing instability from distribution mismatch and improving efficiency. A hybrid training scheme further leverages online role-model data for offline reuse, enhancing data efficiency through repeated sampling. Extensive experiments show that RM-RL converges faster and more stably than existing RL methods, yielding significant gains in real-world manipulation: 53% improvement in translation accuracy and 20% in rotation accuracy. Finally, we demonstrate the successful execution of a challenging task, precisely placing a cell plate onto a shelf, highlighting the frameworkâs effectiveness where prior methods fail. Project site: https://ntumars.github.io/project/RMRL