Safe Multi-Agent Reinforcement Learning for Bimanual Dexterous Manipulation
Weishu Zhan, Peter Chin
Abstract
Bimanual dexterous manipulation in robotics, es- sential for a wide range of applications, addresses the crit- ical challenge of balancing intricate operational capabilities with assured safety and reliability. While Safe Reinforcement Learning is integral to the robustness of robotic systems, the area of safe multi-agent reinforcement learning (MARL), cooperative control of multiple robots has been scarcely studied. In this study, we explore MARL for safe cooperative control with multiple robot hands. Each robot must follow individual and collective safety guidelines to ensure safe team actions. However, the non-stationarity inherent in current algorithms hinders the precise updating of strategies to satisfy these safety constraints effectively. In this paper, we propose Multi- Agent Constrained Proximal Advantage Optimization (MAC- PAO), which considers the sequence of agent updates and integrates non-stationarity into sequential update schemes. This algorithm ensures consistent improvement in both rewards and adherence to safety constraints in each iteration. We tested MACPAO on various tasks with safety constraints and demonstrated that it outperforms other MARL algorithms in balancing reward enhancement and safety compliance. Supple- mentary materials and code are available at the provided link https://github.com/YONEX4090/MultiSafeHand.git.