← Back ICRA 2026

Deep Reinforcement Learning for Reach-Avoid-Stay Problems

Gabriel Chenevert, Jingqi Li, Achyuta Kannan, Sangjae Bae, Donggun Lee

PDF

AI summary

Key figure (auto-extracted from paper)

A two-step deep reinforcement learning framework accurately computes the maximal robust reach-avoid-stay set and a switching policy, outperforming existing safety-critical control methods.

Reach-Avoid-Stay Deep Reinforcement Learning Robust Viability Kernel Safety-Critical Control Reachability Analysis Switching Policy

Problem

Current reachability methods struggle to compute the maximal robust set for tasks requiring systems to safely reach and remain within a target under bounded disturbances, often yielding overly conservative or unsafe policies.

Approach

The method uses a two-step deep RL process that first identifies the robust viability kernel (the safe staying region) and then computes the maximal reach-avoid set to that kernel, combining both into a switching control policy via a novel value function transformation.

Key results

Proves theoretical equivalence between computed reach-avoid set and maximal robust RAS set
Computes exact maximal robust set in low-dimensional cases without training error
Achieves over 95% accuracy for high-dimensional systems despite training errors
Outperforms CLBF and baseline methods with higher success rates and fewer false negatives

Why it matters

Provides a scalable, theoretically grounded method for safe motion planning in high-dimensional robotic systems subject to environmental disturbances.

Abstract

Reach-Avoid-Stay (RAS) tasks are essential in applications where systems must safely reach a target set and remain within it under all bounded disturbances. Existing approaches either struggle to compute the maximal robust RAS set—the set of all states from which the RAS task is achievable—or are limited in handling general dynamic systems. To address these challenges, this paper proposes a two-step deep reinforcement learning framework that jointly learns the maximal robust RAS set and the corresponding control policy. The first step identifies the maximal robust control-invariant set within the target set and derives a policy that ensures the system remains within it. The second step computes the maximal robust reach-avoid (RA) set using this invariant set as the target, and it is proven that this RA set is equivalent to the maximal robust RAS set. Leveraging this result, a switching policy is constructed from the two step- wise policies, which constitutes a valid policy guaranteeing completion of the RAS task. Simulation results demonstrate that the proposed framework (1) computes the exact maximal robust RAS set in the absence of training errors, yielding the least restrictive RAS policy, and (2) identifies the RAS set with high accuracy while outperforming baseline methods on RAS tasks.

Index terms

Optimization and Optimal Control Machine Learning for Robot Control Deep Learning Methods