← Back ICRA 2024

Multi-Level Action Tree Rollout (MLAT-R): Efficient and Accurate Online Multiagent Policy Improvement

Andrea Henshall, Sertac Karaman

PDF

Abstract

Rollout algorithms are renowned for their abilities to correct for the suboptimalities of offline-trained base policies. In the multiagent setting, performing online rollout can require an exponentially large number of optimizations with respect to the number of agents. One-agent-at-a-time algorithms of- fer computationally efficient approaches to guaranteed policy improvement; however, this improvement is with respect to a state value estimate derived from a potentially poor base policy. Monte Carlo tree search (MCTS) provably converges to the true state value estimates; however, the exponentially large search space often makes its online use limited. Here, we present the Multi-Level Action Tree Rollout (MLAT-R) algorithm. MLAT-R provides 1) provable improvement over a base policy, 2) policy improvement with respect to the true state value, 3) applicability to any number of agents, and 4) an action space that grows linearly with the number of agents rather than exponentially. In this paper, we outline the algorithm, sketch a proof of its improvement over a base policy, and evaluate its performance on a challenging problem for which the base policy cannot reach a terminal state. Despite the challenging experimental setup, our algorithm reached a terminal state in 86% of all experiments, compared to 31% for state-of-the-art one-agent-at-a-time algorithms. In experiments involving MCTS, MLAT-R reached a terminal state in 99% of experiments compared to 92% for MCTS. MLAT-R achieved these results while considering an exponentially smaller action space than MCTS.

Index terms

Multi-Robot Systems Reinforcement Learning Optimization and Optimal Control