Policy Diversification through Representation Distinguishability Regularization for Multi-Actor Deep Reinforcement Learning
Meng XU, Xinhong Chen, Shuguang Wang, Guanyi Zhao, Jianping Wang
AI summary
Problem
Existing multi-actor deep reinforcement learning methods struggle to maintain actor diversity, often resulting in redundant policies and suboptimal exploration. Directly maximizing policy divergence is infeasible and can propagate detrimental knowledge across the actor population.
Approach
The method decomposes each actor into a representation module and a decision module, then minimizes the inner product of their representation vectors as an additional loss to encourage diverse environmental understanding without restricting decision-making.
Key results
- Proposes a generic representation distinguishability regularization loss integrable into existing multi-actor DRL frameworks
- Provides theoretical proof linking representation diversity to a reduced gap from the optimal policy
- Improves cumulative return across nine state-of-the-art multi-actor DRL baselines on eight MuJoCo and navigation benchmarks
- Demonstrates robust hyperparameter sensitivity and consistent scalability across varying actor counts
Why it matters
Offers a scalable, model-agnostic solution to the exploration bottleneck in multi-actor reinforcement learning, directly benefiting robotics, autonomous control, and resource allocation applications.
Abstract
Deep reinforcement learning (DRL) has been widely applied to various applications, but improving explo- ration remains a key challenge. Recently, multi-actor DRL has emerged as a promising approach that enhances explo- ration by simultaneously deploying multiple actors for learning. Among these methods, actor diversity helps actors discover better policies. However, existing multi-actor DRL methods still lack effective techniques to promote actor diversity, leading to homogeneous, redundant actors and suboptimal policies. To address this, this work proposes a generic solution that can be seamlessly integrated into existing multi-actor DRL methods to promote actor diversity, thereby enabling better policy learning. Specifically, we decompose each actor into a representation module and a decision-making module, where the representation module receives the environment state and outputs a representation vector for the decision module to generate actions. We then compute the difference between each actor’s representation vector and those of all other actors as an additional loss, referred to as representation distinguishability regularization, and train the actor alongside its original loss to promote actor diversity. We demonstrate that our method effectively improves the performance of nine state-of-the-art (SOTA) multi-actor DRL methods across eight benchmark tasks, in terms of return.