Learning-Based Robust Control: Unifying Exploration and Distributional Robustness for Reliable Robotics Via Free Energy
Hozefa Jesawada, Giovanni Russo, Abdalla Swikir, Fares Abu-Dakka
AI summary
Problem
Learning-based robotic policies frequently fail in real-world deployments due to epistemic uncertainties in dynamics and rewards, yet existing methods lack explicit a priori robustness guarantees while maintaining effective exploration.
Approach
The authors modify the Maximum Diffusion RL framework by embedding a distributionally robust free energy principle that jointly optimizes exploration via a maximally diffusive prior and enforces explicit robustness bounds against model misspecification through per-state-action KL ambiguity sets.
Key results
- Outperforms standard MaxDiff baselines on continuous control benchmarks
- Provides explicit a priori robustness guarantees against dynamics and cost perturbations
- Enables zero-shot sim-to-real deployment on a Franka Emika Panda arm
- Narrows the sim-to-real gap by aligning control with epistemic risk
Why it matters
It offers a theoretically grounded, deployable control framework that bridges robust control theory and learning-based robotics for reliable real-world automation.
Abstract
A key challenge towards reliable robotic control is devising computational models that can both learn policies and guarantee robustness when deployed in the field. Inspired by the free energy principle in computational neuroscience, to address these challenges, we propose a model for policy computation that jointly learns environment dynamics and rewards, while ensuring robustness to epistemic uncertainties. Expounding a distributionally robust free energy principle, we propose a modification to the maximum diffusion learning framework. After explicitly characterizing robustness of our policies to epistemic uncertainties in both environment and reward, we validate their effectiveness on continuous-control benchmarks, via both simulations and real-world experiments involving manipulation with a Franka Research 3 arm. Across simulation and zero-shot deployment, our approach narrows the sim-to-real gap, and enables repeatable tabletop manipulation without task-specific fine-tuning.