Precedence-Aware Multi-UAV Task Allocation with an Attention-Based Reinforcement Learning Framework
Xurui Liu
AI summary
Problem
Coordinating multi-UAV teams under strict precedence, energy, and return-to-base constraints is computationally intractable for exact solvers and poorly represented by standard heuristics or deep reinforcement learning methods.
Approach
We introduce C2T-JA, an end-to-end deep reinforcement learning framework that uses a dual-branch hybrid attention encoder to explicitly model multi-hop causal dependencies, paired with a context-aware decoder that generates globally feasible joint actions.
Key results
- Explicitly models multi-hop causal dependencies via a causal extension mechanism
- Achieves higher task completion rates than exact solvers, heuristics, and learning baselines
- Reduces decision time by several orders of magnitude in large-scale scenarios
- Guarantees strict satisfaction of precedence, energy, and return-to-base constraints via dynamic masking
Why it matters
Enables scalable, constraint-aware coordination for real-world multi-UAV missions like search-and-rescue and environmental monitoring where task dependencies are complex.
Abstract
Multi-UAV coordination is critical for complex real-world applications, but these missions are often constrained by intricate causal dependencies between tasks, alongside strict UAV energy and return-to-base constraints. Existing methods, ranging from exact solvers to standard deep reinforcement learning approaches, struggle to scale with the combinatorial complexity of this problem and often fail to effectively represent the underlying logical task structures. To address this gap, we propose the Causal-Channel Transformer for Joint Allocation (C2T-JA), an end-to-end reinforcement learning framework. The core of C2T-JA is a dual-branch hybrid attention en- coder that explicitly constructs and reasons over multi-hop, disentangled causal channels, effectively decoupling logical dependencies from spatial task features. Building on this rich representation, a context-aware decoder generates a globally coordinated joint action for the entire team. We evaluated C2T-JA against established baselines, including an exact solver (Gurobi), a conventional heuristic (OR-Tools), and a leading learning-based approach (AM-joint), on procedurally generated benchmarks of varying scales and dependency structures. The results demonstrate that our approach consistently produces higher-quality solutions, measured by task completion rates, while reducing decision times by several orders of magnitude, particularly in large-scale scenarios.