Research Analyzer
← Back ICRA 2026

MoRe-ERL: Learning Motion Residuals Using Episodic Reinforcement Learning

Xi Huang, Hongyi Zhou, Ge Li, Yucheng Tang, Weiran Liao, Björn Hein, Tamim Asfour, Rudolf Lioutikov

PDF

AI summary

Key figure (auto-extracted from paper)
Leveraging pre-planned reference trajectories to learn targeted motion residuals via episodic reinforcement learning drastically improves sample efficiency and task performance over training from scratch.
Motion Residuals Episodic Reinforcement Learning B-Spline Primitives Trajectory Refinement Sim-to-Real Transfer Robotics

Problem

Robotic applications require rapid, smooth motion adaptation to dynamic environments, but learning full trajectories from scratch is sample-inefficient and discards pre-planned behaviors, while step-based residual methods often produce jerky motions and rely on dense Markovian rewards.

Approach

MoRe-ERL uses episodic reinforcement learning to jointly identify critical trajectory segments and generate smooth B-spline-based motion residuals that selectively refine only those segments while preserving essential task maneuvers.

Key results

  • First RL algorithm combining episodic RL with residual learning for motion refinement
  • End-to-end policy that jointly identifies critical segments and parameterizes B-spline residuals
  • Superior sample efficiency and task performance compared to training ERL from scratch
  • Successful real-world hardware deployment with minimal sim-to-real gap

Why it matters

Enables robots to adapt quickly and smoothly to dynamic environments using far fewer training samples, bridging the gap between simulation and real-world robotic applications.

Abstract

We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related ma- neuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning signif- icantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.

Index terms

Motion and Path Planning Reinforcement Learning Integrated Planning and Learning

Related papers