← Back ICRA 2026

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

Connor Mattson, Varun Raveendra, Ellen Novoseller, Nicholas Waytowich, Vernon Lawhern, Daniel Brown

PDF

AI summary

Key figure (auto-extracted from paper)

A single human can effectively train cooperative multi-robot teams by demonstrating to one robot at a time, outperforming methods that require unrealistic synchronized demonstrations.

Multi-agent imitation learning behavior cloning single-agent demonstrations human-robot collaboration decentralized policy training robotic teleoperation

Problem

Traditional multi-agent imitation learning assumes humans can simultaneously teleoperate all robots, which is cognitively overwhelming and technically infeasible for complex systems.

Approach

Round-Robin Behavior Cloning (R2BC) cycles through agents, allowing a human to control one robot at a time while others execute their current learned policies, iteratively updating each agent's policy from its own demonstration buffer.

Key results

Matches or exceeds oracle joint-action behavior cloning across four simulated tasks
Outperforms centralized BC by 3.25x and 5.9x on physical navigation and block-pushing tasks
Reduces train-test loss gap compared to joint-action baselines
Successfully deployed on physical robots using real human demonstrations

Why it matters

Enables scalable, realistic human-in-the-loop training for multi-robot systems without requiring complex teleoperation setups or multiple human operators.

Abstract

Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demon- strations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi- robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi- agent action space. We show that R2BC methods match—and in some cases surpass—the performance of an oracle be- havior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations. Videos, code, and supplemental materials can be found at https://sites.google.com/ view/r2bc/home.

Index terms

Learning from Demonstration Multi-Robot Systems Imitation Learning