STAGE: Structure-Adaptive Graph-Encoded Multi-Agent Policy Gradient for Moving Target Search in Uncertain Topological Networks
Qihang Peng, Lizhou Zhu, Lekai Chen, Hongliang Guo, Chih-yung Wen
AI summary
Problem
Existing multi-robot efficient search algorithms assume fixed network topologies, which fails in real-world scenarios where edges can become blocked or revealed, forcing a trade-off between discarding prior map knowledge or suffering from outdated structural assumptions.
Approach
The proposed STAGE algorithm uses a bi-scale graph attention network to capture both local and long-range topological changes, combined with an entropy-regularized counterfactual policy gradient to train decentralized multi-robot policies that adapt to uncertain environments.
Key results
- Introduces STAGE, a novel MARL algorithm explicitly designed for MuRES under uncertain topologies.
- Proposes a distance-augmented long-range GAT that captures global structural changes while mitigating over-smoothing.
- Integrates entropy regularization into counterfactual policy gradients to stabilize learning and enhance exploration.
- Demonstrates superior performance and feasibility through extensive simulations and physical experiments compared to state-of-the-art baselines.
Why it matters
Enables reliable and efficient multi-robot search in dynamic real-world environments like disaster response, where infrastructure damage or uncertainty is common.
Abstract
This paper investigates the multi-robot efficient search (MuRES) problem in uncertain topological networks. One unique characteristic of the studied problem is that the topology of the underlying network is uncertain, posing great challenges to canonical MuRES solutions which presumes a fixed network topology. To address the challenge, this paper proposes the STructure-Adaptive Graph-Encoded policy gradi- ent (STAGE) algorithm for moving target search. STAGE com- prises two main components: (1) the bi-scale graph attention network (GAT) encoder, which fuses a k-hop local GAT with a distance-augmented long-range GAT to enable the encoder to capture both local and long-range network structural changes; and (2) the entropy-regularized counterfactual policy gradient module, which employs a structure-aware centralized critic to estimate both the team returns and the network structure information, and train the decentralized actors via counter- factual marginalization with entropy regularization. Extensive simulation results and physical experiment demonstrate the feasibility and superiority of STAGE for solving MuRES in uncertain topological environments.