Research Analyzer
← Back ICRA 2026

Failure-Aware RL: Reliable Offline-To-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation

improving task performance.

PDF

AI summary

Key figure (auto-extracted from paper)
FARL reduces real-world RL intervention-requiring failures by 73.1% while improving task performance through predictive failure avoidance and self-recovery.
Failure-aware RL offline-to-online RL real-world robotics self-recovery safe exploration reinforcement learning post-training

Problem

Offline-to-online reinforcement learning for robotics frequently causes irreversible Intervention-requiring Failures during exploration, blocking safe real-world deployment.

Approach

FARL trains a latent world model to predict near-future failures offline, then uses a fixed recovery policy to override risky actions during online policy fine-tuning.

Key results

  • Introduced FailureBench benchmark for failure-aware RL evaluation
  • Reduced real-world intervention-requiring failures by 73.1%
  • Improved average task performance by 11.3% during online fine-tuning
  • Achieved up to 65.8% failure reduction in challenging simulated environments

Why it matters

Enables safer, more efficient real-world robotic policy refinement by minimizing costly human interventions during reinforcement learning post-training.

Abstract

Post-training algorithms based on deep reinforce- ment learning can push the limits of robotic models for specific objectives, such as generalizability, accuracy, and robustness. However, Intervention-requiring Failures (IR Failures) (e.g., a robot spilling water or breaking fragile glass) during real-world exploration happen inevitably, hindering the practical deploy- ment of such a paradigm. To tackle this, we introduce Failure- Aware Offline-to-Online Reinforcement Learning (FARL), a framework for minimizing failures during real-world rein- forcement learning. We create FailureBench, a benchmark that incorporates common failure scenarios requiring human intervention, and propose an algorithm that integrates a world- model-based safety critic and a recovery policy trained offline to prevent failures during online exploration. Extensive simulation and real-world experiments demonstrate the effectiveness of FARL in significantly reducing IR Failures while improving performance and generalization during online reinforcement learning post-training. FARL reduces IR Failures by 73.1% while elevating performance by 11.3% on average during real- world RL post-training.

Index terms

Reinforcement Learning Deep Learning in Grasping and Manipulation Robot Safety

Related papers