MoRe-ERL: Learning Motion Residuals Using Episodic Reinforcement Learning
Xi Huang, Hongyi Zhou, Ge Li, Yucheng Tang, Weiran Liao, Björn Hein, Tamim Asfour, Rudolf Lioutikov
AI summary
Problem
Robotic applications require rapid, smooth motion adaptation to dynamic environments, but learning full trajectories from scratch is sample-inefficient and discards pre-planned behaviors, while step-based residual methods often produce jerky motions and rely on dense Markovian rewards.
Approach
MoRe-ERL uses episodic reinforcement learning to jointly identify critical trajectory segments and generate smooth B-spline-based motion residuals that selectively refine only those segments while preserving essential task maneuvers.
Key results
- First RL algorithm combining episodic RL with residual learning for motion refinement
- End-to-end policy that jointly identifies critical segments and parameterizes B-spline residuals
- Superior sample efficiency and task performance compared to training ERL from scratch
- Successful real-world hardware deployment with minimal sim-to-real gap
Why it matters
Enables robots to adapt quickly and smoothly to dynamic environments using far fewer training samples, bridging the gap between simulation and real-world robotic applications.
Abstract
We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related ma- neuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning signif- icantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.