← Back ICRA 2026

SeFA-Policy: Fast and Accurate Visuomotor Policy Learning with Selective Flow Alignment

Rong Xue, Jiageng Mao, Mingtong Zhang, Yue Wang

PDF

AI summary

Key figure (auto-extracted from paper)

SeFA-Policy achieves real-time, one-step visuomotor control with superior accuracy by selectively aligning flow-generated actions with expert demonstrations to eliminate distillation-induced errors.

Visuomotor policy Flow matching Imitation learning Real-time control Robotic manipulation Selective alignment

Problem

Flow-based visuomotor policies accelerate inference but suffer from observation-action inconsistency during iterative distillation. This causes accumulated errors and task failures, while diffusion models remain too slow for real-time control.

Approach

The method introduces a selective flow alignment strategy that uses expert demonstrations to correct generated actions when they closely match ground truth, preserving multimodality while ensuring observation consistency for one-step inference.

Key results

Surpasses state-of-the-art diffusion and flow baselines in accuracy and robustness across 66 simulated tasks
Reduces inference latency by over 98% compared to Diffusion Policy and AdaFlow
Achieves 100% success rate on Franka Kitchen tasks using only low-dimensional conditioning
Maintains action diversity and multimodality while eliminating reflow-induced error accumulation

Why it matters

Provides a scalable, real-time solution for robotic manipulation that bridges the gap between generative modeling efficiency and precise visuomotor control.

Abstract

Developing efficient and accurate visuomotor poli- cies poses a central challenge in robotic imitation learning. While recent rectified flow approaches have advanced visuo- motor policy learning, they suffer from a key limitation: After iterative distillation, generated actions may deviate from the ground-truth actions corresponding to the current visual observation, leading to accumulated error as the reflow process repeats and unstable task execution. We present Selective Flow Alignment (SeFA), an efficient and accurate visuomotor policy learning framework. SeFA resolves this challenge by a selective flow alignment strategy, which leverages expert demonstrations to selectively correct generated actions and restore consistency with observations, while preserving multimodality. This design introduces a consistency correction mechanism that ensures generated actions remain observation-aligned without sacrific- ing the efficiency of one-step flow inference. Extensive experi- ments across both simulated and real-world manipulation tasks show that SeFA surpasses state-of-the-art diffusion-based and flow-based policies, achieving superior accuracy and robustness while reducing inference latency by over 98%. By unifying rectified flow efficiency with observation-consistent action gen- eration, SeFA provides a scalable and dependable solution for real-time visuomotor policy learning. Code is available on SeFA code.

Index terms

Imitation Learning Visual Learning Deep Learning in Grasping and Manipulation