Scalable Multi-Agent Reinforcement Learning Framework for Multi-Machine Tending
Abdalwhab Bakheet Mohamed Abdalwhab, Giovanni Beltrame, David St-Onge
AI summary
Problem
Current multi-robot manufacturing systems rely on centralized control or fixed-input reinforcement learning models, which lack scalability, create single points of failure, and require costly retraining when system configurations change.
Approach
The authors developed SMAPPO, a decentralized multi-agent reinforcement learning framework featuring a novel attention-based observation encoder that dynamically processes varying numbers of robots, machines, and storage areas without requiring fixed input sizes or extensive retraining.
Key results
- Up to 61% performance improvement with full retraining
- Up to 45% higher productivity and 49% fewer collisions via curriculum learning
- Up to 272% better zero-shot generalization to new scales
- Up to 100% increase in parts delivery under low initial training
Why it matters
It enables flexible, resilient scaling of decentralized robotic fleets in manufacturing, reducing infrastructure costs and retraining efforts for Industry 5.0 applications.
Abstract
Robotic manipulators hold significant untapped potential for manufacturing industries, particularly when de- ployed in multi-robot configurations that can enhance resource utilization, increase throughput, and reduce costs. However, industrial manipulators typically operate in isolated one-robot, one-machine setups, limiting both utilization and scalability. Even mobile robot implementations generally rely on central- ized architectures, creating vulnerability to single points of failure and requiring robust communication infrastructure. This paper introduces SMAPPO (Scalable Multi-Agent Prox- imal Policy Optimization), a scalable input-size invariant multi- agent reinforcement learning model for decentralized multi- robot management in industrial environments. MAPPO (Multi- Agent Proximal Policy Optimization) represents the current state-of-the-art approach. We optimized an existing simulator to handle complex multi-agent reinforcement learning scenar- ios and designed a new multi-machine tending scenario for evaluation. Our novel observation encoder enables SMAPPO to handle varying numbers of agents, machines, and storage areas with minimal or no retraining. Results demonstrate SMAPPO’s superior performance compared to the state-of-the- art MAPPO across multiple conditions: full retraining (up to 61% improvement), curriculum learning (up to 45% increased productivity and up to 49% fewer collisions), zero-shot gener- alization to significantly different scale scenarios (up to 272% better performance without retraining), and adaptability under extremely low initial training (up to 100% increase in parts delivery).