← Back ICRA 2026

MIMIC-D: Multi-Modal Imitation for MultI-Agent Coordination with Decentralized Diffusion Policies

Dayi, E Dong, Maulik Bhatt, Seoyeon Choi, Negar Mehr

PDF

AI summary

Key figure (auto-extracted from paper)

MIMIC-D enables robots to implicitly coordinate multi-modal tasks without explicit communication or centralized planning by jointly training decentralized diffusion policies.

Multi-agent coordination Decentralized diffusion Multi-modal imitation learning Robot collaboration Generative policies

Problem

Standard imitation learning collapses multi-modal expert data into single modes, while existing multi-agent diffusion methods require unrealistic centralized planning or explicit communication. This leaves a gap for robust coordination in decentralized real-world deployments.

Approach

The method jointly trains individual diffusion-based policies for each agent using shared expert demonstrations, allowing them to learn implicit coordination during training while executing independently using only local observations.

Key results

Significantly lower collision rates and higher task completion in simulated two-agent swap and three-agent road crossing tasks
Superior replication of multi-modal expert trajectory distributions compared to baseline methods
95% success rate in hardware bimanual basket-lifting manipulation trials
Successful recovery of distinct coordination modes without explicit communication

Why it matters

Enables robust, decentralized multi-robot coordination in real-world scenarios where explicit communication or central control is impractical, advancing human-robot and multi-robot collaboration.

Abstract

As robots become more integrated in society, their ability to coordinate with other robots and humans on multi- modal tasks (those with multiple valid solutions) is crucial. Such behaviors can be learned from expert demonstrations via imi- tation learning (IL), but when expert demonstrations are multi- modal, standard IL approaches usually average across modes or collapse to a single mode, preventing effective coordination. Being inspired by diffusion models’ ability to capture complex multi-modal trajectory distributions in single-agent settings, we develop a diffusion-based framework for coordinated multi- modal behavior in multi-agent systems. However, existing multi- agent diffusion approaches typically require a centralized plan- ner or explicit communication among agents. This assumption can fail in real-world scenarios where robots must operate independently or with agents like humans that they cannot directly communicate with. Therefore, we propose MIMIC- D, a joint training with decentralized execution paradigm for multi-modal multi-agent IL via diffusion. We jointly train all agents’ policies with only local information to achieve implicit coordination. In simulation and hardware experiments, our method exhibits robust multi-modal coordination behavior in various tasks and environments, improving upon state-of-the- art baselines. All authors are with the Department of Mechanical Engineering, Univer- sity of California Berkeley, Berkeley, CA 94709, USA {dayi.dong, maulikbhatt, seoyeon99, negar}@berkeley.edu This work was supported by the National Science Foundation un- der Grants ECCS-2438314 (CAREER Award), CNS-2529645, and CCF- 2423134, and by the Army Research Laboratory under Grant W911NF-26- 1-0002. *Indicates equal contribution.

Index terms

Multi-Robot Systems Imitation Learning Cooperating Robots