HIPPO-MAT: Decentralized Task Allocation Using GraphSAGE and Multi-Agent Deep Reinforcement Learning
Lavanya Ratnabala,, Robinroy Peter,, Aleksey Fedoseev, and Dzmitry Tsetserukou
AI summary
Problem
Continuous, concurrent task allocation in dynamic 3D heterogeneous multi-agent systems remains challenging due to the brittleness of centralized methods and the lack of conflict resolution and navigation coupling in existing decentralized approaches.
Approach
Each agent constructs a local ego-centric graph of peers, encodes it with a pretrained GraphSAGE network, and uses Independent PPO for decentralized decision-making, tightly coupled with a reservation-based A* planner for collision-free navigation.
Key results
- 91% first-decision conflict-free success rate in simulation
- 89.6% conflict-free success rate in real-world deployment
- Near-optimal cost gap (~16.8%) vs centralized Hungarian algorithm
- Fully decentralized training and execution for scalable coordination
Why it matters
Provides a scalable, robust alternative to centralized task allocation for heterogeneous drone and ground robot fleets operating in dynamic 3D environments.
Abstract
We address the problem of decentralized con- tinuous task allocation in heterogeneous multi-agent systems operating in three-dimensional environments. We propose HIPPO-MAT, a fully decentralized framework that combines a GraphSAGE-based graph neural network with Independent Proximal Policy Optimization (IPPO) to enable concurrent and conflict-aware decision-making. Each agent constructs an ego-centric dynamic graph over peers within its communi- cation range, computes embeddings via a mean-aggregating GraphSAGE encoder, and feeds these into its own independent policy. To improve stability, the encoder is pretrained with an encoder–decoder reconstruction loss on synthetic ego-graphs before reinforcement learning using supervised learning. This design allows heterogeneous agents such as unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to allocate tasks continuously and in parallel without relying on centralized critics or coordination. Navigation is achieved with a reservation-based A* planner coupled with onboard SLAM, ensuring collision avoidance. We validate the approach extensively in simulation with up to 30 agents and in real-world deployment on JetBot ROS AI robots running policies locally on Jetson Nano boards with ESP32-S3 modules for ESP-NOW peer-to-peer communication. Results demonstrate a 91% first- decision conflict-free success rate (CFSR) up to 30 agents and 89.6% on robots, a near-optimal cost gap compared to the centralized Hungarian algorithm, with an average of 16.8%, and significantly faster allocation times.