KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching Via KAN & RWKV
Zhihao Chen, Yiyuan Ge, Ziyang Wang
AI summary
Problem
Current generative visuomotor policies rely on heavy UNet-style backbones, causing high latency and compute costs that prevent deployment on resource-constrained robots.
Approach
KAN-We-Flow replaces large UNets with an efficient RWKV-KAN backbone for linear-time temporal mixing and spline-based feature calibration, augmented by an Action Consistency Regularization loss to stabilize one-step action generation.
Key results
- Introduces an RWKV-KAN U-shaped backbone for efficient sequence modeling
- Proposes Action Consistency Regularization (ACR) to anchor predictions to expert demonstrations
- Reduces model parameters by 86.8% while enabling real-time one-step inference
- Achieves state-of-the-art success rates across Adroit, Meta-World, and DexArt benchmarks
Why it matters
Enables high-performance, real-time robotic manipulation on edge devices by drastically reducing compute overhead without sacrificing accuracy.
Abstract
Diffusion-based visuomotor policies excel at mod- eling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient time/channel mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform feature-wise nonlinear calibration of the action mapping on RWKV outputs. Moreover, we introduce an Action Consistency Regularization (ACR), a lightweight auxiliary loss that enforces alignment between predicted action trajectories and expert demonstrations via Euler extrapolation, providing additional supervision to stabilize training and improve pol- icy precision. Without resorting to large UNets, our design reduces parameters by 86.8%, maintains fast runtime, and achieves state-of-the-art success rates on Adroit, Meta-World, and DexArt benchmarks. Our project page can be viewed in link .