GRAM: Generalization in Deep RL with a Robust Adaptation Module
James Queeney, Xiaoyi Cai, Alexander Schperberg, Radu Corcodel, Mouhacine Benosman, Jonathan How
AI summary
Problem
Deep RL policies typically excel at either adapting to known training conditions or remaining robust to unknown ones, but struggle to achieve both simultaneously for reliable real-world deployment.
Approach
GRAM quantifies deployment uncertainty using an epistemic neural network to dynamically blend adaptive latent features with a robust anchor, trained via a joint pipeline that merges teacher-student adaptation and adversarial robustness.
Key results
- Unified architecture achieving simultaneous in-distribution adaptation and out-of-distribution robustness
- Uncertainty-aware adaptation module that biases latent contexts toward a robust anchor when uncertain
- Joint training pipeline combining teacher-student learning and adversarial RL for balanced policy optimization
- Demonstrated strong generalization across simulation and real-world quadruped locomotion tasks
Why it matters
Provides a practical pathway for deploying reliable, generalizable deep RL policies in complex, uncertain real-world robotic applications.
Abstract
The reliable deployment of deep reinforcement learn- ing in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environ- ment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution ro- bustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenar- ios upon deployment, which we demonstrate through extensive simulation and hardware locomotion experiments on a quadruped robot.