MIMIC-D: Multi-Modal Imitation for MultI-Agent Coordination with Decentralized Diffusion Policies
Dayi, E Dong, Maulik Bhatt, Seoyeon Choi, Negar Mehr
AI summary
Problem
Standard imitation learning collapses multi-modal expert data into single modes, while existing multi-agent diffusion methods require unrealistic centralized planning or explicit communication. This leaves a gap for robust coordination in decentralized real-world deployments.
Approach
The method jointly trains individual diffusion-based policies for each agent using shared expert demonstrations, allowing them to learn implicit coordination during training while executing independently using only local observations.
Key results
- Significantly lower collision rates and higher task completion in simulated two-agent swap and three-agent road crossing tasks
- Superior replication of multi-modal expert trajectory distributions compared to baseline methods
- 95% success rate in hardware bimanual basket-lifting manipulation trials
- Successful recovery of distinct coordination modes without explicit communication
Why it matters
Enables robust, decentralized multi-robot coordination in real-world scenarios where explicit communication or central control is impractical, advancing human-robot and multi-robot collaboration.
Abstract
As robots become more integrated in society, their ability to coordinate with other robots and humans on multi- modal tasks (those with multiple valid solutions) is crucial. Such behaviors can be learned from expert demonstrations via imi- tation learning (IL), but when expert demonstrations are multi- modal, standard IL approaches usually average across modes or collapse to a single mode, preventing effective coordination. Being inspired by diffusion models’ ability to capture complex multi-modal trajectory distributions in single-agent settings, we develop a diffusion-based framework for coordinated multi- modal behavior in multi-agent systems. However, existing multi- agent diffusion approaches typically require a centralized plan- ner or explicit communication among agents. This assumption can fail in real-world scenarios where robots must operate independently or with agents like humans that they cannot directly communicate with. Therefore, we propose MIMIC- D, a joint training with decentralized execution paradigm for multi-modal multi-agent IL via diffusion. We jointly train all agents’ policies with only local information to achieve implicit coordination. In simulation and hardware experiments, our method exhibits robust multi-modal coordination behavior in various tasks and environments, improving upon state-of-the- art baselines. All authors are with the Department of Mechanical Engineering, Univer- sity of California Berkeley, Berkeley, CA 94709, USA {dayi.dong, maulikbhatt, seoyeon99, negar}@berkeley.edu This work was supported by the National Science Foundation un- der Grants ECCS-2438314 (CAREER Award), CNS-2529645, and CCF- 2423134, and by the Army Research Laboratory under Grant W911NF-26- 1-0002. *Indicates equal contribution.