← Back ICRA 2026

Agile Flight Emerges from Multi-Agent Competitive Racing

Vineet Pasumarti, Lorenzo Bianchi, Antonio Loquercio

PDF

AI summary

Key figure (auto-extracted from paper)

Sparse, competitive rewards naturally induce agile flight and tactical strategies in drone racing agents, outperforming dense progress-based rewards and enabling more reliable real-world transfer.

Multi-agent reinforcement learning Autonomous drone racing Emergent behavior Sim-to-real transfer Sparse rewards Agile flight

Problem

Current RL approaches for autonomous drone racing rely on dense, progress-based rewards that prescribe specific trajectories, which constrains exploration and fails to capture complex tactical behaviors needed in high-complexity environments.

Approach

The authors train two drone agents in a competitive head-to-head racing setup using only sparse rewards for winning laps and passing gates, allowing advanced low-level control and high-level strategies to emerge organically.

Key results

Emergence of agile flight and tactical behaviors from sparse rewards
Outperforms dense progress-based rewards in obstacle-rich tracks
Achieves more reliable zero-shot sim-to-real transfer
Demonstrates generalization to unseen opponents

Why it matters

It shows that simple competitive objectives can replace complex reward engineering, providing a scalable framework for training robust, strategic autonomous agents that transfer effectively to physical hardware.

Abstract

Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide evidence in both simulation and the real world that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline, in particular when the complexity of the envi- ronment increases, e.g., in the presence of obstacles. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. In addition to improved sim-to-real transfer, the multi-agent policies also exhibit some degree of generalization to opponents unseen at training time. Overall, our work, following in the tradition of multi-agent competitive game-play in digital domains, shows that sparse task-level rewards are sufficient for training agents capable of advanced low-level control in the physical world. § Code Å Video

Index terms

Aerial Systems: Perception and Autonomy Machine Learning for Robot Control Aerial Systems: Applications