← Back ICRA 2026

GRAM: Generalization in Deep RL with a Robust Adaptation Module

James Queeney, Xiaoyi Cai, Alexander Schperberg, Radu Corcodel, Mouhacine Benosman, Jonathan How

PDF

AI summary

Key figure (auto-extracted from paper)

GRAM unifies adaptive and robust deep RL into a single architecture, enabling quadruped robots to reliably generalize to both familiar and novel environmental dynamics at deployment.

Deep reinforcement learning dynamics generalization robust adaptation legged locomotion epistemic uncertainty zero-shot generalization

Problem

Deep RL policies typically excel at either adapting to known training conditions or remaining robust to unknown ones, but struggle to achieve both simultaneously for reliable real-world deployment.

Approach

GRAM quantifies deployment uncertainty using an epistemic neural network to dynamically blend adaptive latent features with a robust anchor, trained via a joint pipeline that merges teacher-student adaptation and adversarial robustness.

Key results

Unified architecture achieving simultaneous in-distribution adaptation and out-of-distribution robustness
Uncertainty-aware adaptation module that biases latent contexts toward a robust anchor when uncertain
Joint training pipeline combining teacher-student learning and adversarial RL for balanced policy optimization
Demonstrated strong generalization across simulation and real-world quadruped locomotion tasks

Why it matters

Provides a practical pathway for deploying reliable, generalizable deep RL policies in complex, uncertain real-world robotic applications.

Abstract

The reliable deployment of deep reinforcement learn- ing in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environ- ment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution ro- bustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenar- ios upon deployment, which we demonstrate through extensive simulation and hardware locomotion experiments on a quadruped robot.

Index terms

Reinforcement Learning Machine Learning for Robot Control