← Back ICRA 2026

Generative Adversarial Imitation Learning for Robot Swarms: Learning from Human Demonstrations and Trained Policies

Mattes Kraus, Jonas Kuckling

PDF

AI summary

Key figure (auto-extracted from paper)

Learned swarm policies successfully replicate human-demonstrated collective behaviors and perform comparably to experts in both simulation and real-world deployments.

Robot swarms Imitation learning Generative adversarial networks Human demonstration Swarm robotics Decentralized control

Problem

Designing decentralized control for robot swarms is difficult because emergent collective behaviors are hard to specify via manual trial-and-error or explicit reward functions, which often lead to reward hacking. Existing imitation learning methods also rely on pre-existing expert policies, creating a bootstrapping paradox for novel swarm tasks.

Approach

The authors introduce SwarmGAIL, which adapts generative adversarial imitation learning to learn decentralized policies from swarm-level features, using a custom Unity-based tool to collect human demonstrations and evaluating performance against both human and PPO-generated expert rollouts.

Key results

Replicates human-demonstrated collective behaviors across six distinct missions
Achieves real-world deployment performance comparable to simulation and expert demonstrations
Successfully imitates behaviors from both human operators and PPO-trained policies
Provides an open-source demonstration tool for intuitive swarm behavior design

Why it matters

Enables intuitive, reward-free design of emergent swarm behaviors, accelerating the development of scalable multi-robot systems for practical applications.

Abstract

In imitation learning, robots are supposed to learn from demonstrations of the desired behavior. Most of the work in imitation learning for swarm robotics provides the demonstrations as rollouts of an existing policy. In this work, we provide a framework based on generative adversarial imitation learning that aims to learn collective behaviors from human demonstrations. Our framework is evaluated across six different missions, learning both from manual demonstrations and demonstrations derived from a PPO-trained policy. Results show that the imitation learning process is able to learn qualitatively meaningful behaviors that perform similarly well as the provided demonstrations. Additionally, we deploy the learned policies on a swarm of TurtleBot 4 robots in real-robot experiments. The exhibited behaviors preserved their visually recognizable character and their performance is comparable to the one achieved in simulation.

Index terms

Swarm Robotics Imitation Learning Learning from Demonstration