Cooperative-Competitive Team Play of Real-World Craft Robots
Rui Zhao, Xihui Li, Yizheng Zhang, Yuzhen Liu, Zhong Zhang, Yufeng Zhang, Cheng Zhou, Zhengyou Zhang, Lei Han
AI summary
Problem
Multi-agent reinforcement learning struggles with efficient training and sim-to-real transfer for physical robots due to asynchronous action execution and environmental discrepancies that cause simulation-trained policies to fail upon deployment.
Approach
The authors develop a complete robotic platform with discrete and continuous simulations, and propose OODSI to inject out-of-distribution states from testing environments into the training start-state distribution, alongside guided RL with action masking to accelerate learning.
Key results
- Complete multi-robot platform with pyBullet and Gazebo simulations
- OODSI method to inject out-of-distribution states into training
- 20% improvement in Sim2Real performance
- Real-world deployment of cooperative and competitive team strategies
Why it matters
Enables scalable, data-driven multi-agent coordination for real-world robotics applications without relying on costly real-world data collection or manual control design.
Abstract
Multi-agent deep Reinforcement Learning (RL) has made significant progress in developing intelligent game- playing agents in recent years. However, the efficient training of collective robots using multi-agent RL and the transfer of learned policies to real-world applications remain open research questions. In this work, we first develop a comprehen- sive robotic system, including simulation, distributed learning framework, and physical robot components. We then propose and evaluate reinforcement learning techniques designed for efficient training of cooperative and competitive policies on this platform. To address the challenges of multi-agent sim-to-real transfer, we introduce Out of Distribution State Initialization (OODSI) to mitigate the impact of the sim-to-real gap. In the experiments, OODSI improves the Sim2Real performance by 20%. We demonstrate the effectiveness of our approach through experiments with a multi-robot car competitive game and a cooperative task in real-world settings.