← Back ICRA 2026

Accelerating Residual Reinforcement Learning with Uncertainty Estimation

Lakshita Dodeja, Karl Schmeckpeper, Shivam Vats, Thomas Weng, Mingxi Jia, George Konidaris, Stefanie Tellex

PDF

AI summary

Key figure (auto-extracted from paper)

Leveraging base policy uncertainty to guide exploration and modifying the critic for combined actions significantly accelerates residual reinforcement learning and enables robust sim-to-real transfer.

Reinforcement learning Residual RL Uncertainty estimation Stochastic policies Sim-to-real transfer Robot control

Problem

Existing residual reinforcement learning methods suffer from unconstrained exploration and are limited to deterministic base policies, making them inefficient and unsuitable for modern stochastic imitation learners.

Approach

The method uses the base policy's uncertainty estimates to restrict exploration to uncertain states and modifies the off-policy critic to learn Q-values for the combined base and residual actions, enabling stable training with stochastic policies.

Key results

Uncertainty-guided exploration focuses residual learning on high-uncertainty states
Asymmetric actor-critic formulation enables off-policy residual RL with stochastic base policies
Outperforms state-of-the-art finetuning, demo-augmented, and residual RL baselines across simulation benchmarks
Demonstrates successful zero-shot sim-to-real transfer on a physical robot

Why it matters

Enables more sample-efficient and robust adaptation of modern stochastic robot policies, accelerating practical deployment in real-world environments.

Abstract

Residual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing methods struggle with sparse rewards and are designed for deterministic base policies. We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty estimates of the base policy to focus exploration on regions in which the base policy is not confident. Second, we propose a simple modi- fication to off-policy residual learning that allows it to observe base actions and better handle stochastic base policies. We evaluate our method with both Gaussian-based and Diffusion-based stochastic base policies on tasks from Robosuite and D4RL, and compare against state-of-the-art finetuning methods, demo-augmented RL methods, and other Residual RL methods. Our algorithm signif- icantly outperforms existing baselines in a variety of simulation benchmark environments. We also deploy our learned policies in the real world to demonstrate their robustness with zero-shot sim-to-real transfer.

Index terms

Reinforcement Learning Deep Learning Methods Machine Learning for Robot Control