← Back ICRA 2026

Graph-Based Multi-Agent Reinforcement Learning for Scalable UAV Formation Control and Target Tracking

Haowen Wang, Shuting Zhang, Guangchen Li

PDF

AI summary

Key figure (auto-extracted from paper)

A conflict-aware graph-based reinforcement learning framework enables UAV swarms to scale to larger sizes while simultaneously tracking targets and maintaining formations in cluttered environments without retraining.

Multi-Agent Reinforcement Learning UAV Swarms Formation Control Target Tracking Graph Neural Networks Motion Primitives

Problem

Classical control methods and existing multi-agent reinforcement learning frameworks struggle to simultaneously maintain stable formations and track agile targets in cluttered environments, often failing to scale to large swarms or adapt to dynamic obstacles.

Approach

The method uses a graph neural network with conflict-aware attention to aggregate neighborhood information, paired with a hierarchical policy that selects discrete motion primitives and refines them with continuous adjustments for dynamically feasible maneuvers.

Key results

Conflict-aware graph representation captures local interactions and global formation geometry
Hierarchical policy combines discrete primitive selection with continuous trajectory refinement
Unified framework jointly optimizes formation control and target tracking
Policies trained on small swarms generalize to larger ones without retraining, validated in simulations and real-world experiments

Why it matters

Provides a scalable, data-driven solution for coordinating large UAV swarms in complex environments, advancing practical deployment for surveillance, search-and-rescue, and environmental monitoring.

Abstract

This paper presents a graph-based multi-agent reinforcement learning framework for scalable UAV formation control and target tracking. The framework introduces a conflict-aware graph representation that aggregates neighbor- hood information through attention-based message passing, enabling each UAV to analyze both local interactions and global formation geometry. To generate agile and stable maneuvers, a hierarchical policy is designed that first selects motion primitives from a structured library and then refines them with continuous trajectory adjustments, ensuring smooth and dynamically feasible flight in cluttered environments. Extensive simulations and real-world experiments validate the proposed approach, demonstrating accurate target tracking, stable for- mation maintenance, and robust adaptation across varying swarm sizes and obstacle densities. In particular, policies trained on smaller swarms generalize effectively to larger ones without retraining, highlighting the scalability and practicality. The demonstration video is available on the project website: https://swift520.github.io/Formation-Tracking/.

Index terms

Swarm Robotics Reinforcement Learning Distributed Robot Systems