← Back ICRA 2026

Scalable Multi-Agent Reinforcement Learning Framework for Multi-Machine Tending

Abdalwhab Bakheet Mohamed Abdalwhab, Giovanni Beltrame, David St-Onge

PDF

AI summary

Key figure (auto-extracted from paper)

SMAPPO enables scalable, decentralized multi-robot coordination for manufacturing with minimal retraining, significantly outperforming state-of-the-art methods in productivity, safety, and adaptability.

Multi-Agent Reinforcement Learning Scalable Robotics Decentralized Control Machine Tending Zero-Shot Generalization Industrial Automation

Problem

Current multi-robot manufacturing systems rely on centralized control or fixed-input reinforcement learning models, which lack scalability, create single points of failure, and require costly retraining when system configurations change.

Approach

The authors developed SMAPPO, a decentralized multi-agent reinforcement learning framework featuring a novel attention-based observation encoder that dynamically processes varying numbers of robots, machines, and storage areas without requiring fixed input sizes or extensive retraining.

Key results

Up to 61% performance improvement with full retraining
Up to 45% higher productivity and 49% fewer collisions via curriculum learning
Up to 272% better zero-shot generalization to new scales
Up to 100% increase in parts delivery under low initial training

Why it matters

It enables flexible, resilient scaling of decentralized robotic fleets in manufacturing, reducing infrastructure costs and retraining efforts for Industry 5.0 applications.

Abstract

Robotic manipulators hold significant untapped potential for manufacturing industries, particularly when de- ployed in multi-robot configurations that can enhance resource utilization, increase throughput, and reduce costs. However, industrial manipulators typically operate in isolated one-robot, one-machine setups, limiting both utilization and scalability. Even mobile robot implementations generally rely on central- ized architectures, creating vulnerability to single points of failure and requiring robust communication infrastructure. This paper introduces SMAPPO (Scalable Multi-Agent Prox- imal Policy Optimization), a scalable input-size invariant multi- agent reinforcement learning model for decentralized multi- robot management in industrial environments. MAPPO (Multi- Agent Proximal Policy Optimization) represents the current state-of-the-art approach. We optimized an existing simulator to handle complex multi-agent reinforcement learning scenar- ios and designed a new multi-machine tending scenario for evaluation. Our novel observation encoder enables SMAPPO to handle varying numbers of agents, machines, and storage areas with minimal or no retraining. Results demonstrate SMAPPO’s superior performance compared to the state-of-the- art MAPPO across multiple conditions: full retraining (up to 61% improvement), curriculum learning (up to 45% increased productivity and up to 49% fewer collisions), zero-shot gener- alization to significantly different scale scenarios (up to 272% better performance without retraining), and adaptability under extremely low initial training (up to 100% increase in parts delivery).

Index terms

AI and Machine Learning in Manufacturing and Logistics Systems Path Planning for Multiple Mobile Robots or Agents Integrated Planning and Control