Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning
Jiayu Chen, Chao Yu, Guosheng Li, Wenhao Tang, Shilong Ji, Xinyi Yang, Botian Xu, Huazhong Yang, Yu Wang
AI summary
Problem
Existing RL-based pursuit-evasion methods are largely confined to simplified 2D simulations or fixed scenarios, failing to address 3D dynamics, partial observability, and the sim-to-real gap for real-world deployment.
Approach
The authors introduce OPEN, which integrates an attention-based evader prediction network for partial observability, an adaptive environment generator for efficient curriculum learning, and calibrated dynamics with reward refinement to output collective thrust and body rate commands for zero-shot sim-to-real transfer.
Key results
- Achieves near 100% capture rate across four test scenarios, outperforming all baselines
- Improves training sample efficiency by over 50% via adaptive curriculum generation
- Maintains robust capture performance against high-speed evaders unseen during training
- Successfully deploys the learned policy on real Crazyflie quadrotors in a zero-shot manner
Why it matters
Provides a scalable, data-driven solution for cooperative multi-UAV operations in complex, unknown environments, bridging the gap between simulation and real-world swarm applications.
Abstract
Multi-UAV pursuit-evasion, where pursuers aim to capture evaders, poses a key challenge for UAV swarm intelli- gence. Multi-agent reinforcement learning (MARL) has demon- strated potential in modeling cooperative behaviors, but most RL- based approaches remain constrained to simplified simulations with limited dynamics or fixed scenarios. Previous attempts to deploy RL policy to real-world pursuit-evasion are largely re- stricted to two-dimensional scenarios, such as ground vehicles or UAVs at fixed altitudes. In this paper, we propose a novel MARL-based algorithm that learns online planning for multi-UAV pursuit-evasion in unknown environments (OPEN). OPEN intro- duces an evader prediction-enhanced network to tackle partial observability in cooperative policy learning. Additionally, OPEN proposes an adaptive environment generator within MARL train- ing, enabling higher exploration efficiency and better policy gener- alization across diverse scenarios. Simulations show our method significantly outperforms all baselines in challenging scenarios, generalizing to unseen scenarios with a 100% capture rate. Fi- nally, after integrating calibrated dynamics models of UAVs into training, we derive a feasible policy via a two-stage reward re- finement and deploy the policy on real quadrotors in a zero-shot manner. To our knowledge, this is the first work to derive and deploy an RL-based policy using collective thrust and body rates control commands for multi-UAV pursuit-evasion in unknown environments. The open-source code and videos are available at https://sites.google.com/view/pursuit-evasion-rl.