Agile Flight Emerges from Multi-Agent Competitive Racing
Vineet Pasumarti, Lorenzo Bianchi, Antonio Loquercio
AI summary
Problem
Current RL approaches for autonomous drone racing rely on dense, progress-based rewards that prescribe specific trajectories, which constrains exploration and fails to capture complex tactical behaviors needed in high-complexity environments.
Approach
The authors train two drone agents in a competitive head-to-head racing setup using only sparse rewards for winning laps and passing gates, allowing advanced low-level control and high-level strategies to emerge organically.
Key results
- Emergence of agile flight and tactical behaviors from sparse rewards
- Outperforms dense progress-based rewards in obstacle-rich tracks
- Achieves more reliable zero-shot sim-to-real transfer
- Demonstrates generalization to unseen opponents
Why it matters
It shows that simple competitive objectives can replace complex reward engineering, providing a scalable framework for training robust, strategic autonomous agents that transfer effectively to physical hardware.
Abstract
Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide evidence in both simulation and the real world that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline, in particular when the complexity of the envi- ronment increases, e.g., in the presence of obstacles. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. In addition to improved sim-to-real transfer, the multi-agent policies also exhibit some degree of generalization to opponents unseen at training time. Overall, our work, following in the tradition of multi-agent competitive game-play in digital domains, shows that sparse task-level rewards are sufficient for training agents capable of advanced low-level control in the physical world. § Code Å Video