← Back ICRA 2026

TACC: Multi-Agent Reinforcement Learning for Task Allocation with Communication Coordination in UAV Swarms

Zehao Xiong, Yexun Xi, Yizhe Cao, Chuan Li, Rong Li, Lixian Shen and Jie Li∗

PDF

AI summary

Key figure (auto-extracted from paper)

TACC balances communication overhead and task allocation reliability in UAV swarms using constrained multi-agent reinforcement learning, achieving state-of-the-art performance in both simulation and real-world flight tests.

UAV swarms multi-agent reinforcement learning task allocation communication coordination constrained policy optimization hardware-in-the-loop

Problem

Existing multi-agent reinforcement learning methods for UAV task allocation ignore the structural conflict between task conflicts and communication overhead, leading to exploration difficulties and training instability in dynamic, communication-constrained environments.

Approach

The authors propose TACC, a POMDP-based framework that learns an adaptive communication gating mechanism to coordinate transmission timing, combined with an asynchronous experience aggregation method and Multi-Objective Constrained Policy Optimization (MOCPO) to stabilize training and balance multiple objectives.

Key results

Optimal trade-off between communication efficiency and allocation reliability
Stabilized training and faster convergence via MOCPO's Lagrangian loss
Successful deployment of the communication strategy on RK3588 SOC hardware
Superior scheduling outcomes in a ten-UAV swarm search-and-rescue flight experiment

Why it matters

Provides a robust, deployable framework for real-time distributed task allocation in communication-limited UAV swarms, bridging the gap between theoretical MARL and practical multi-agent robotics.

Abstract

Task allocation in UAV swarms is increasingly challenging due to task complexity, communication limits, and algorithm robustness. Combining reinforcement learning with task allocation offers promise but often ignores the conflict between task conflicts and communication overhead, causing exploration and stability issues. This paper proposes Task Allocation with Communication Coordination (TACC), which learns a gated mechanism to balance communication efficiency and allocation reliability. TACC is modeled as a POMDP with adaptive gating actions and shared rewards for task conflicts, and an asynchronous experience aggregation method is designed for CTDE. We further introduce Multi-Objective Constrained Policy Optimization (MOCPO), which applies con- strained policy optimization via a Lagrangian loss to stabilize training and improve convergence. Finally, sim-to-real experi- ments are conducted in the HIL environment, and the results demonstrate the optimal trade-off achieved by the proposed method and its overall state-of-the-art approaches. Ablation studies and hyperparameter experiments further validated the stability of MOCPO. Specifically, the communication strategy is effectively deployed in the RK3588 SOC, and the flight experiment demonstrates the superior scheduling outcomes of TACC within the ten-UAV swarm in the search and rescue.

Index terms

Planning Scheduling and Coordination AI-Based Methods Swarm Robotics