← Back ICRA 2026

HIPPO-MAT: Decentralized Task Allocation Using GraphSAGE and Multi-Agent Deep Reinforcement Learning

Lavanya Ratnabala,, Robinroy Peter,, Aleksey Fedoseev, and Dzmitry Tsetserukou

PDF

AI summary

Key figure (auto-extracted from paper)

HIPPO-MAT enables scalable, conflict-aware decentralized task allocation for heterogeneous 3D multi-agent systems with near-optimal performance and high real-world success rates.

Multi-agent systems Decentralized task allocation Graph neural networks Independent PPO Real-world deployment 3D navigation

Problem

Continuous, concurrent task allocation in dynamic 3D heterogeneous multi-agent systems remains challenging due to the brittleness of centralized methods and the lack of conflict resolution and navigation coupling in existing decentralized approaches.

Approach

Each agent constructs a local ego-centric graph of peers, encodes it with a pretrained GraphSAGE network, and uses Independent PPO for decentralized decision-making, tightly coupled with a reservation-based A* planner for collision-free navigation.

Key results

91% first-decision conflict-free success rate in simulation
89.6% conflict-free success rate in real-world deployment
Near-optimal cost gap (~16.8%) vs centralized Hungarian algorithm
Fully decentralized training and execution for scalable coordination

Why it matters

Provides a scalable, robust alternative to centralized task allocation for heterogeneous drone and ground robot fleets operating in dynamic 3D environments.

Abstract

We address the problem of decentralized con- tinuous task allocation in heterogeneous multi-agent systems operating in three-dimensional environments. We propose HIPPO-MAT, a fully decentralized framework that combines a GraphSAGE-based graph neural network with Independent Proximal Policy Optimization (IPPO) to enable concurrent and conflict-aware decision-making. Each agent constructs an ego-centric dynamic graph over peers within its communi- cation range, computes embeddings via a mean-aggregating GraphSAGE encoder, and feeds these into its own independent policy. To improve stability, the encoder is pretrained with an encoder–decoder reconstruction loss on synthetic ego-graphs before reinforcement learning using supervised learning. This design allows heterogeneous agents such as unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to allocate tasks continuously and in parallel without relying on centralized critics or coordination. Navigation is achieved with a reservation-based A* planner coupled with onboard SLAM, ensuring collision avoidance. We validate the approach extensively in simulation with up to 30 agents and in real-world deployment on JetBot ROS AI robots running policies locally on Jetson Nano boards with ESP32-S3 modules for ESP-NOW peer-to-peer communication. Results demonstrate a 91% first- decision conflict-free success rate (CFSR) up to 30 agents and 89.6% on robots, a near-optimal cost gap compared to the centralized Hungarian algorithm, with an average of 16.8%, and significantly faster allocation times.

Index terms

Task Planning Multi-Robot Systems Reinforcement Learning