← Back ICRA 2026

R-FAC: Resilient Value Function Factorization for Multi-Robot Efficient Search with Individual Failure Probabilities

Hongliang Guo, Qi Kang, Wei-Yun Yau, Chee-Meng Chew, Daniela Rus

PDF

AI summary

Key figure (auto-extracted from paper)

V2DN dynamically aggregates individual robot value functions to maintain optimal search performance despite unpredictable robot failures.

Resilient multi-robot search Multi-agent reinforcement learning Value function factorization Robot failure Log-sum-exp V2DN

Problem

Existing multi-robot search algorithms assume all team members remain operational, but real-world deployments frequently suffer from unpredictable individual robot failures that degrade coordination and detection speed.

Approach

The authors introduce the Resilient Value Function Factorization (R-FAC) paradigm and instantiate it as V2DN, a multi-agent reinforcement learning algorithm that uses a log-sum-exp mechanism to dynamically combine individual value functions into a central value function that adapts to any team size.

Key results

Introduction of the R-FAC paradigm and V2DN algorithm
Highest resiliency score against state-of-the-art baselines in MUSEUM and OFFICE environments
Optimal search performance maintained during random robot malfunctions
Successful real-world multi-robot deployment in a custom indoor environment

Why it matters

Enables reliable coordination of multi-robot teams in hazardous or unpredictable environments where hardware failures are inevitable.

Abstract

This paper investigates the resilient multi-robot efficient search problem (R-MuRES), which aims at coordinating multiple robots to detect a ‘non-adversarial’ moving target with the minimal expected time. One unique characteristic of R- MuRES among others is the possibility of individual robot’s mal- function and withdrawal from the team during task execution, which results in a variable number of searchers in the deployment phase and entails that the possibility of team member failures must be considered during the planning stage, particularly in the training phase. We propose a resilient value function factorization (R-FAC) paradigm, which constructs the central value function from individual ones in a resilient manner, taking into account individual robots’ failures, and ensures that the constructed central value function has the minimal mean squared temporal difference error across various team compositions. R-FAC stipulates that the individual global maximum (IGM) principle is satisfied for whichever team configuration and thus any functioning robot contributes positively to the remaining team, as long as it executes the greedy policy with respect to the factorized individual value function. Subsequently, we introduce the variational value decomposition network (V2DN) as one of the instantiated R-FAC algorithms. V2DN employs the log-sum-exp mechanism to construct the central value function from individual ones, enabling it to take a varying number of robots’ individual value functions as inputs. Then, we explain why, specifically for the multi-robot search task, the log-sum-exp mechanism is superior to the brute-force summation operation used in the canonical value decomposition network (VDN), and compare V2DN with state-of-the-art MuRES solutions as well as the vanilla VDN algorithm in two canonical MuRES testing environments and show that it achieves the best resiliency score when one or several individual robots quit the team during task execution. Furthermore, we validate V2DN with a real multi- robot system in a self-constructed indoor environment as the proof of concept.

Index terms

Multi-Robot Systems Learning and Adaptive Systems Cooperating Robots Distributed Robot Systems