MADR: MPC-Guided Adversarial Deepreach
Ryan Teoh, Sander Tonkens, William Sharpless, Aijia Yang, Zeyuan Feng, Somil Bansal, Sylvia Herbert
AI summary
Problem
Classical Hamilton-Jacobi reachability guarantees safety in zero-sum games but fails in high dimensions due to the curse of dimensionality, while existing deep learning approximations converge slowly and inaccurately for complex adversarial interactions.
Approach
The method augments physics-informed deep learning with supervised signals from sampling-based MPC rollouts, where the opponent's policy is derived from the current value function gradient to generate separate control and disturbance datasets for robust training.
Key results
- Significantly outperforms state-of-the-art baselines in simulation fidelity and safety margins
- Successfully scales to high-dimensional systems with varying dynamics across TurtleBots, drones, and humanoids
- Delivers robust real-world hardware performance against adversarial agents and disturbances
- Introduces a pursuit-evasion filter to mitigate long-horizon suboptimality for the adversary
Why it matters
Enables scalable, theoretically grounded safety-critical control for complex robots operating in unpredictable or adversarial real-world environments.
Abstract
Hamilton-Jacobi Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimen- sionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. Recent works have demonstrated that enriching the self-supervision process with regular supervision (based on the nature of the optimal control problem) greatly accelerates convergence and solution quality; however, these have been limited to single-player problems and simple games. In this work, we introduce MADR: MPC- guided Adversarial DeepReach, a general framework to ro- bustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly outperforms state-of-the-art baselines in simulation and produces impressive results in hardware.