← Back ICRA 2026

Policy Diversification through Representation Distinguishability Regularization for Multi-Actor Deep Reinforcement Learning

Meng XU, Xinhong Chen, Shuguang Wang, Guanyi Zhao, Jianping Wang

PDF

AI summary

Key figure (auto-extracted from paper)

A lightweight representation distinguishability regularization loss effectively promotes actor diversity and boosts performance across nine state-of-the-art multi-actor DRL methods.

Multi-actor DRL Actor Diversity Representation Learning Regularization Reinforcement Learning Policy Optimization

Problem

Existing multi-actor deep reinforcement learning methods struggle to maintain actor diversity, often resulting in redundant policies and suboptimal exploration. Directly maximizing policy divergence is infeasible and can propagate detrimental knowledge across the actor population.

Approach

The method decomposes each actor into a representation module and a decision module, then minimizes the inner product of their representation vectors as an additional loss to encourage diverse environmental understanding without restricting decision-making.

Key results

Proposes a generic representation distinguishability regularization loss integrable into existing multi-actor DRL frameworks
Provides theoretical proof linking representation diversity to a reduced gap from the optimal policy
Improves cumulative return across nine state-of-the-art multi-actor DRL baselines on eight MuJoCo and navigation benchmarks
Demonstrates robust hyperparameter sensitivity and consistent scalability across varying actor counts

Why it matters

Offers a scalable, model-agnostic solution to the exploration bottleneck in multi-actor reinforcement learning, directly benefiting robotics, autonomous control, and resource allocation applications.

Abstract

Deep reinforcement learning (DRL) has been widely applied to various applications, but improving explo- ration remains a key challenge. Recently, multi-actor DRL has emerged as a promising approach that enhances explo- ration by simultaneously deploying multiple actors for learning. Among these methods, actor diversity helps actors discover better policies. However, existing multi-actor DRL methods still lack effective techniques to promote actor diversity, leading to homogeneous, redundant actors and suboptimal policies. To address this, this work proposes a generic solution that can be seamlessly integrated into existing multi-actor DRL methods to promote actor diversity, thereby enabling better policy learning. Specifically, we decompose each actor into a representation module and a decision-making module, where the representation module receives the environment state and outputs a representation vector for the decision module to generate actions. We then compute the difference between each actor’s representation vector and those of all other actors as an additional loss, referred to as representation distinguishability regularization, and train the actor alongside its original loss to promote actor diversity. We demonstrate that our method effectively improves the performance of nine state-of-the-art (SOTA) multi-actor DRL methods across eight benchmark tasks, in terms of return.

Index terms

Reinforcement Learning Representation Learning Machine Learning for Robot Control