← Back ICRA 2026

Learning Cooperative Strategies for Drone Swarms Using Multi-Agent Reinforcement Learning

Christian Llanes, Kyle Williams, Spencer Jensen, Samuel Coogan

PDF

AI summary

Key figure (auto-extracted from paper)

Evader drone teams learn to coordinate via reinforcement learning to deliberately cause faster pursuers to collide, enabling successful goal navigation despite inferior individual capabilities.

Multi-Agent Reinforcement Learning Pursuit-Evasion Drone Swarms Cooperative Control Sim-to-Real Transfer Proximal Policy Optimization

Problem

When pursuers have superior speed or control authority, individual evader drones struggle to reach targets without being captured. The paper addresses the gap in scalable, cooperative strategies for asymmetric multi-agent pursuit-evasion scenarios.

Approach

The authors train evader drone teams using Multi-Agent Proximal Policy Optimization (MAPPO) to learn coordinated maneuvers that intentionally guide superior pursuers into mutual collisions.

Key results

Developed a 6-DOF MAPPO algorithm for multi-agent pursuit-evasion
Proposed an augmented proportional navigation defense strategy for pursuers
Validated algorithm adaptability across 2v2 and 4v4 team configurations
Demonstrated successful sim-to-real transfer on Crazyflie hardware under real-world constraints

Why it matters

Provides a scalable framework for less capable drone swarms to defeat superior opponents through learned coordination, advancing robust multi-agent autonomy for defense and search missions.

Abstract

In this work, we investigate cooperative strategies for an evader drone team of various sizes using multi-agent reinforcement learning in a multi-agent pursuit-evasion sce- nario. The objective of the evader team is to reach a goal with minimal velocity while not colliding with the pursuer team. The objective of the pursuer team is to defend the goal by catching evaders before they reach it. In this environment, we allow the pursuer to have superior control authority compared to the evader such that reaching the goal is challenging for the evader in a one-on-one scenario. The proposed strategy for an evader is to team up with an ally to lead pursuers into a collision with each other instead of intercepting the evader. We design policies using multi-agent proximal policy optimization, an actor-critic reinforcement learning method, and investigate how the learned strategy changes when we vary the size of the pursuer and evader teams. Finally, we demonstrate the learned policy’s sim-to-real capabilities through a hardware demonstration.

Index terms

Swarm Robotics Cooperating Robots Reinforcement Learning