← Back ICRA 2026

EDAIL: Adversarial Imitation Learning Via Exploration-Driven Data Augmentation

Pengcheng Li, qiang fang, Xin Xu

PDF

AI summary

Key figure (auto-extracted from paper)

EDAIL dramatically improves imitation learning from sparse expert demonstrations by leveraging agent-generated exploratory data and an asymmetric reward function to stabilize training and boost performance.

Adversarial Imitation Learning Exploration-Driven Augmentation Sparse Demonstrations Asymmetric Reward Robotics Policy Learning

Problem

Adversarial Imitation Learning struggles with mode collapse and training instability when expert demonstrations are limited, primarily due to poor solution space coverage and discriminator bias from class imbalance.

Approach

The method augments the discriminator with high-confidence state-action pairs generated during agent exploration and applies an asymmetric surrogate reward function to shift the decision boundary and correct classifier bias.

Key results

Introduces exploratory policies to supplement sparse expert demonstrations
Designs an asymmetric surrogate reward function to mitigate discriminator bias
Achieves 84.67% success rate on FetchPush using only 1% expert data
Outperforms baselines across six robotic tasks with faster convergence and lower variance

Why it matters

Enables robust policy learning in robotics and autonomous systems where expert data is scarce, reducing reliance on extensive demonstrations or manual reward engineering.

Abstract

Adversarial Imitation Learning (AIL) is a promi- nent paradigm in imitation learning that enables policy acqui- sition from expert demonstrations without relying on manually crafted reward functions. Although AIL has achieved promising results in certain scenarios, many existing methods suffer from mode collapse and training instability when expert demonstra- tions are limited. Given that agent–environment interactions are often abundant, we focus on effectively leveraging such in- teraction data to address the above challenges. In this paper, we propose a novel adversarial imitation learning framework called Exploration-Driven Adversarial Imitation Learning (EDAIL). First, we introduce exploratory policies that augment the discriminator’s training data with high-confidence state-action pairs generated by the agent, thereby improving coverage of the solution space under sparse expert data. Second, we design an asymmetric surrogate reward function that shifts the reward- penalty boundary to mitigate discriminator bias caused by class imbalance, enabling more reliable policy optimization. We evaluate our method on six simulated tasks, including robotic manipulation, locomotion, and navigation, using only 1% and 10% of the datasets employed in prior baselines as expert demonstrations. Experimental results show that our method outperforms the baselines, demonstrating both the effectiveness and robustness of our method. In particular, it achieves a success rate of 84.67% on the FetchPush task using only 1% of expert demonstrations, representing an absolute improvement of 19.27 points over the state-of-the- art method. Our code will be available at https://github. com/lipengcheng-nudt/EDAIL.

Index terms

Imitation Learning Learning from Demonstration Reinforcement Learning