SeFA-Policy: Fast and Accurate Visuomotor Policy Learning with Selective Flow Alignment
Rong Xue, Jiageng Mao, Mingtong Zhang, Yue Wang
AI summary
Problem
Flow-based visuomotor policies accelerate inference but suffer from observation-action inconsistency during iterative distillation. This causes accumulated errors and task failures, while diffusion models remain too slow for real-time control.
Approach
The method introduces a selective flow alignment strategy that uses expert demonstrations to correct generated actions when they closely match ground truth, preserving multimodality while ensuring observation consistency for one-step inference.
Key results
- Surpasses state-of-the-art diffusion and flow baselines in accuracy and robustness across 66 simulated tasks
- Reduces inference latency by over 98% compared to Diffusion Policy and AdaFlow
- Achieves 100% success rate on Franka Kitchen tasks using only low-dimensional conditioning
- Maintains action diversity and multimodality while eliminating reflow-induced error accumulation
Why it matters
Provides a scalable, real-time solution for robotic manipulation that bridges the gap between generative modeling efficiency and precise visuomotor control.
Abstract
Developing efficient and accurate visuomotor poli- cies poses a central challenge in robotic imitation learning. While recent rectified flow approaches have advanced visuo- motor policy learning, they suffer from a key limitation: After iterative distillation, generated actions may deviate from the ground-truth actions corresponding to the current visual observation, leading to accumulated error as the reflow process repeats and unstable task execution. We present Selective Flow Alignment (SeFA), an efficient and accurate visuomotor policy learning framework. SeFA resolves this challenge by a selective flow alignment strategy, which leverages expert demonstrations to selectively correct generated actions and restore consistency with observations, while preserving multimodality. This design introduces a consistency correction mechanism that ensures generated actions remain observation-aligned without sacrific- ing the efficiency of one-step flow inference. Extensive experi- ments across both simulated and real-world manipulation tasks show that SeFA surpasses state-of-the-art diffusion-based and flow-based policies, achieving superior accuracy and robustness while reducing inference latency by over 98%. By unifying rectified flow efficiency with observation-consistent action gen- eration, SeFA provides a scalable and dependable solution for real-time visuomotor policy learning. Code is available on SeFA code.