← Back ICRA 2026

Self-Organised Sequential Multi-Agent Reinforcement Learning for Closely Cooperation Tasks

Hao Fu, YOU MINGYU, Zhou Hongjun, Bin He

PDF

AI summary

Key figure (auto-extracted from paper)

SOS-MARL leverages sequential decision-making and automatic agent grouping to solve closely cooperative tasks, achieving a 36% average improvement in task completion over existing multi-agent reinforcement learning methods.

MARL Sequential Decision Closely Cooperative Tasks Automatic Grouping Multi-Agent Systems Box-Pushing

Problem

Standard multi-agent reinforcement learning struggles with closely cooperative tasks where simultaneous actions are required, as individual optimal policies often conflict with group optima and trap agents in local Nash equilibria.

Approach

The method converts parallel agent decisions into sequential, autoregressive action selection within automatically formed groups, using recursive reward decomposition to align individual policies with global objectives.

Key results

Sequential decision-making framework that aligns individual and group optima
Automatic grouping mechanism based on state-action coupling scores
36% average improvement in task completion rate over state-of-the-art MARL algorithms
Successful deployment and validation in both simulated and real-world box-pushing environments

Why it matters

Enables scalable and reliable coordination for multi-robot systems in tasks requiring precise simultaneous action, advancing real-world robotic deployment.

Abstract

Cooperative tasks are common in multi-agent systems, with closely cooperative tasks being a special case of this, where a change in the state of the environment requires multiple agents to perform a specific operation at the same time. Take a box-pushing task as an example, the box is heavy and requires multiple agents to push it simultaneously. Optimal actions in a closely cooperation task are correlated with the actions of other agents, which makes the individual optimal action potentially inconsistent with the group optimal action, which leads to more non-globally optimal Nash equilibrium policies in the problem. This makes it easier for the policy learned by reinforcement learning to fall into these locally optimal policies. In this paper, we propose a self-organised sequential multi-agent reinforcement learning algorithm (SOS- MARL). We propose sequential decision-making to change the optimization objective of the agent’s policy so that the learned policy tends to group optimal policies. And propose an automatic grouping mechanism to make the policy smoother for training and reasoning in large-scale agent environments. We decompose the joint action value factorization outside the group into a combination of each group action value, thus guiding the agents to improve their group policies in a fine-grained manner. We deployed scenarios in both simulated and real environments and compared SOS-MARL with various classical MARL algorithms on box-pushing tasks, demonstrating the state-of-the-art of our method.

Index terms

Reinforcement Learning Multi-Robot Systems Cooperating Robots