← Back ICRA 2026

Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation Via Dynamic Cluster Agreements

Antonio Marino, Esteban Restrepo, Claudio Pacchierotti, Paolo Robuffo Giordano

PDF

AI summary

Key figure (auto-extracted from paper)

LGTC-IPPO enables decentralized multi-agent teams to dynamically form adaptive sub-teams, achieving stable coordination and robust multi-resource allocation without global information.

Multi-Agent Reinforcement Learning Dynamic Clustering Resource Allocation Decentralized Control Multi-Robot Systems Cluster Consensus

Problem

Centralized resource allocation methods fail to scale in large multi-agent systems due to communication and computational limits, while existing decentralized approaches struggle with credit assignment, local minima, and dynamic environmental changes.

Approach

The authors introduce LGTC-IPPO, a decentralized reinforcement learning framework that combines Independent Proximal Policy Optimization with a learned dynamic cluster consensus mechanism, allowing agents to self-organize into local sub-teams based on real-time resource demands.

Key results

Dynamic cluster consensus enables adaptive sub-team formation for targeted resource delivery
Hybrid reward structure effectively balances global demand satisfaction with local collision avoidance
Outperforms VDN, QMIX, and MOMAPPO in reward stability, coordination, and scalability
Validated via simulation and real-world drone experiments under varying team sizes and resource depletion dynamics

Why it matters

Provides a scalable, communication-efficient coordination strategy for large-scale multi-robot applications like disaster response and logistics where centralized control is impractical.

Abstract

This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, Liquid-Graph-Time Clustering- IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforce- ment learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimen- tal results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.

Index terms

Multi-Robot Systems Reinforcement Learning Swarm Robotics