Anytime Probabilistically Constrained Provably Convergent Online Belief Space Planning
Andrey Zhitnikov, Vadim Indelman
AI summary
Problem
Existing online planning methods for partially observable environments either ensure safety only at infinite convergence time, allow dangerous actions to corrupt search statistics, or rely on slow learning components in continuous domains.
Approach
The authors propose an anytime Monte Carlo Tree Search algorithm that directly prunes unsafe actions from the belief tree and revises node statistics to remove their influence, enabling safe planning in continuous spaces with general probabilistic constraints.
Key results
- Anytime safety guarantees with respect to the expanded belief tree
- Proof of convergence in probability with an exponential rate
- Demonstration that cleaning pruned actions prevents objective degradation
- Superior safety and objective performance over baselines with minimal tree queries
Why it matters
Enables reliable, safe autonomous decision-making for robots under uncertainty without sacrificing computational efficiency or waiting for algorithmic convergence.
Abstract
Taking into account future risk is essential for an autonomously operating robot to find online not only the best but also a safe action to execute. In this paper, we build upon the recently introduced formulation of probabilistic belief-dependent constraints. In our methodology safety can be materialized with any general belief-dependent operator we call payoff. We present an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains in terms of states, actions and observations and general-belief dependent reward and payoff operators. Unlike previous approaches, our method ensures safety anytime with respect to the currently expanded search tree without relying on the convergence of the search. We prove convergence in probability with an exponential rate of a version of our algorithms and study proposed techniques via extensive simulations. Even with a tiny number of tree queries, the best action found by our approach is much safer than the baseline. Moreover, our approach constantly yields better than the baseline action in terms of objective function. This is because we revise the values and statistics maintained in the search tree and remove from them the contribution of the pruned actions. We rigorously show that our cleaning routine is necessary. Without it, at the limit of convergence of MCTS, an infinite amount of sampled dangerous actions can be detrimental to the objective function.