← Back ICRA 2026

Precedence-Aware Multi-UAV Task Allocation with an Attention-Based Reinforcement Learning Framework

Xurui Liu

PDF

AI summary

Key figure (auto-extracted from paper)

C2T-JA decouples causal task dependencies from spatial features to achieve higher completion rates and drastically faster decision times than existing solvers in large-scale multi-UAV coordination.

Multi-UAV coordination task allocation reinforcement learning causal dependencies attention mechanism constraint satisfaction

Problem

Coordinating multi-UAV teams under strict precedence, energy, and return-to-base constraints is computationally intractable for exact solvers and poorly represented by standard heuristics or deep reinforcement learning methods.

Approach

We introduce C2T-JA, an end-to-end deep reinforcement learning framework that uses a dual-branch hybrid attention encoder to explicitly model multi-hop causal dependencies, paired with a context-aware decoder that generates globally feasible joint actions.

Key results

Explicitly models multi-hop causal dependencies via a causal extension mechanism
Achieves higher task completion rates than exact solvers, heuristics, and learning baselines
Reduces decision time by several orders of magnitude in large-scale scenarios
Guarantees strict satisfaction of precedence, energy, and return-to-base constraints via dynamic masking

Why it matters

Enables scalable, constraint-aware coordination for real-world multi-UAV missions like search-and-rescue and environmental monitoring where task dependencies are complex.

Abstract

Multi-UAV coordination is critical for complex real-world applications, but these missions are often constrained by intricate causal dependencies between tasks, alongside strict UAV energy and return-to-base constraints. Existing methods, ranging from exact solvers to standard deep reinforcement learning approaches, struggle to scale with the combinatorial complexity of this problem and often fail to effectively represent the underlying logical task structures. To address this gap, we propose the Causal-Channel Transformer for Joint Allocation (C2T-JA), an end-to-end reinforcement learning framework. The core of C2T-JA is a dual-branch hybrid attention en- coder that explicitly constructs and reasons over multi-hop, disentangled causal channels, effectively decoupling logical dependencies from spatial task features. Building on this rich representation, a context-aware decoder generates a globally coordinated joint action for the entire team. We evaluated C2T-JA against established baselines, including an exact solver (Gurobi), a conventional heuristic (OR-Tools), and a leading learning-based approach (AM-joint), on procedurally generated benchmarks of varying scales and dependency structures. The results demonstrate that our approach consistently produces higher-quality solutions, measured by task completion rates, while reducing decision times by several orders of magnitude, particularly in large-scale scenarios.

Index terms

Integrated Planning and Learning Motion and Path Planning Reinforcement Learning