← Back ICRA 2026

RM-RL: Role-Model Reinforcement Learning for Precise Robot Manipulation

Xiangyu Chen, Chuhao Zhou, Yuxi Liu, Jianfei Yang

PDF

AI summary

Key figure (auto-extracted from paper)

Automatically labeling real-world interaction data with a role-model strategy enables stable, data-efficient reinforcement learning for millimeter-precision robot manipulation without human demonstrations.

Role-Model RL Real-World Robotics Precise Manipulation Online-Offline Training Data Efficiency Robot Policy Learning

Problem

High-precision real-world robot manipulation requires expert demonstrations that are costly to collect, while standard reinforcement learning suffers from data inefficiency and distribution shifts during online training.

Approach

The framework automatically identifies the highest-reward action from similar initial states to label online data, reformulating policy learning as supervised training and reusing samples through a hybrid online-offline replay scheme.

Key results

53% improvement in translation accuracy and 20% in rotation accuracy
Faster and more stable policy convergence in real-world trials
Successful millimeter-precision cell plate placement where baselines fail
Eliminates need for human demonstrations while mitigating distribution shift

Why it matters

Provides a practical, demonstration-free training pathway for deploying high-precision robotic policies in delicate biological and chemical laboratory settings.

Abstract

Precise robot manipulation is critical for fine- grained applications such as chemical and biological exper- iments, where even small errors (e.g., reagent spillage) can invalidate an entire task. Existing approaches often rely on pre- collected expert demonstrations and train policies via imitation learning (IL) or offline reinforcement learning (RL). However, obtaining high-quality demonstrations for precision tasks is dif- ficult and time-consuming, while offline RL commonly suffers from distribution shifts and low data efficiency. We introduce a Role-Model Reinforcement Learning (RM-RL) framework that unifies online and offline training in real-world environments. The key idea is a role-model strategy that automatically gen- erates labels for online training data using approximately op- timal actions, eliminating the need for human demonstrations. RM-RL reformulates policy learning as supervised training, reducing instability from distribution mismatch and improving efficiency. A hybrid training scheme further leverages online role-model data for offline reuse, enhancing data efficiency through repeated sampling. Extensive experiments show that RM-RL converges faster and more stably than existing RL methods, yielding significant gains in real-world manipulation: 53% improvement in translation accuracy and 20% in rotation accuracy. Finally, we demonstrate the successful execution of a challenging task, precisely placing a cell plate onto a shelf, highlighting the framework’s effectiveness where prior methods fail. Project site: https://ntumars.github.io/project/RMRL

Index terms

Reinforcement Learning AI-Enabled Robotics Deep Learning in Grasping and Manipulation