VFP: Variational Flow-Matching Policy for Multi-Modal Robot Manipulation
Xuanran Zhai, Qianyou Zhao, Qiaojun Yu, Ce Hao
AI summary
Problem
Flow-matching policies accelerate robot action sampling but struggle with multi-modal distributions, often collapsing to averaged or ambiguous behaviors that fail in complex manipulation tasks.
Approach
VFP uses a variational latent prior to identify distinct action modes, applies Kantorovich Optimal Transport to align predicted and expert distributions, and leverages a Mixture-of-Experts decoder for specialized, efficient sampling.
Key results
- 49% average success rate improvement over flow-matching baselines in simulation
- Higher success counts on 3 real-robot tasks than DP and FlowPolicy
- Effective modeling of both task-level and path-level multi-modality
- Retains fast single-step ODE inference with a compact model size
Why it matters
Enables reliable, real-time multi-modal robot manipulation for applications requiring diverse, collision-free, or context-dependent behaviors.
Abstract
Flow-matching-based policies have recently emerged as a promising approach for learning-based robot manipulation, offering significant acceleration in action sampling compared to diffusion-based policies. However, conventional flow-matching methods struggle with multi-modality, often collapsing to averaged or ambiguous behaviors in complex manipulation tasks. To address this, we propose the Variational Flow-Matching Policy (VFP), which introduces a variational latent prior for mode-aware action generation and effectively captures both task-level and trajectory-level multi-modality. VFP further incorporates Kantorovich Optimal Transport (K- OT) for distribution-level alignment and utilizes a Mixture-of- Experts (MoE) decoder for mode specialization and efficient inference. We comprehensively evaluate VFP on 41 simulated tasks and 3 real-robot tasks, demonstrating its effectiveness and sampling efficiency in both simulated and real-world settings. Results show that VFP achieves a 49% relative improvement in task success rate over standard flow-based baselines in simulation, and further outperforms them on real-robot tasks, while still maintaining fast inference and a compact model size. More details are available on our project page: https: //sites.google.com/view/varfp/