← Back ICRA 2026

Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning

Jiayu Chen, Chao Yu, Guosheng Li, Wenhao Tang, Shilong Ji, Xinyi Yang, Botian Xu, Huazhong Yang, Yu Wang

PDF

AI summary

Key figure (auto-extracted from paper)

A novel MARL framework enables multi-UAV teams to cooperatively capture evaders in unknown 3D environments and deploy directly to real quadrotors without fine-tuning.

Multi-UAV pursuit-evasion deep reinforcement learning sim-to-real transfer cooperative control adaptive curriculum zero-shot deployment

Problem

Existing RL-based pursuit-evasion methods are largely confined to simplified 2D simulations or fixed scenarios, failing to address 3D dynamics, partial observability, and the sim-to-real gap for real-world deployment.

Approach

The authors introduce OPEN, which integrates an attention-based evader prediction network for partial observability, an adaptive environment generator for efficient curriculum learning, and calibrated dynamics with reward refinement to output collective thrust and body rate commands for zero-shot sim-to-real transfer.

Key results

Achieves near 100% capture rate across four test scenarios, outperforming all baselines
Improves training sample efficiency by over 50% via adaptive curriculum generation
Maintains robust capture performance against high-speed evaders unseen during training
Successfully deploys the learned policy on real Crazyflie quadrotors in a zero-shot manner

Why it matters

Provides a scalable, data-driven solution for cooperative multi-UAV operations in complex, unknown environments, bridging the gap between simulation and real-world swarm applications.

Abstract

Multi-UAV pursuit-evasion, where pursuers aim to capture evaders, poses a key challenge for UAV swarm intelli- gence. Multi-agent reinforcement learning (MARL) has demon- strated potential in modeling cooperative behaviors, but most RL- based approaches remain constrained to simplified simulations with limited dynamics or fixed scenarios. Previous attempts to deploy RL policy to real-world pursuit-evasion are largely re- stricted to two-dimensional scenarios, such as ground vehicles or UAVs at fixed altitudes. In this paper, we propose a novel MARL-based algorithm that learns online planning for multi-UAV pursuit-evasion in unknown environments (OPEN). OPEN intro- duces an evader prediction-enhanced network to tackle partial observability in cooperative policy learning. Additionally, OPEN proposes an adaptive environment generator within MARL train- ing, enabling higher exploration efficiency and better policy gener- alization across diverse scenarios. Simulations show our method significantly outperforms all baselines in challenging scenarios, generalizing to unseen scenarios with a 100% capture rate. Fi- nally, after integrating calibrated dynamics models of UAVs into training, we derive a feasible policy via a two-stage reward re- finement and deploy the policy on real quadrotors in a zero-shot manner. To our knowledge, this is the first work to derive and deploy an RL-based policy using collective thrust and body rates control commands for multi-UAV pursuit-evasion in unknown environments. The open-source code and videos are available at https://sites.google.com/view/pursuit-evasion-rl.

Index terms

Reinforcement Learning Cooperating Robots Multi-Robot Systems