Flow before Imitation: Learning Dexterous In-Hand Manipulation with Dynamic Visuotactile Shortcut Policy
Yijin Chen, Wenqiang Xu, Zhenjun Yu, Tutian Tang, Yutong Li, Siqiong Yao, Cewu Lu
AI summary
Problem
Dexterous in-hand manipulation faces challenges from complex contact dynamics and partial observability, while existing visuotactile methods rely on static fusion or bulky sensors that hinder real-world adaptability.
Approach
FBI extracts tactile information from temporal object motion flow using a dynamics-aware latent model, dynamically fuses it with visual inputs via a transformer, and trains a one-step shortcut diffusion policy for real-time execution.
Key results
- Dynamic visuotactile fusion enables dual vision-only and visuo-tactile operational modes
- Achieves 64.7% to 66.5% average simulation success, surpassing prior SOTA by up to 18.4%
- Delivers 33.5% to 35.0% real-world success rates across in-hand and Adroit benchmark tasks
- Flow2Tactile module predicts dense contact states from point cloud flow with 85.5% accuracy
Why it matters
It enables robust, real-time dexterous manipulation in sensor-limited environments, advancing practical deployment of robotic hands for complex manipulation tasks.
Abstract
Dexterous in-hand manipulation remains a long- standing challenge in robotics, primarily due to the complex contact dynamics and partial observability. While humans synergize vision and touch for such tasks, robotic approaches often prioritize one modality, therefore limiting adaptability. This paper introduces Flow Before Imitation (FBI), a visuo- tactile imitation learning framework that dynamically fuses tactile interactions with visual observations through motion dynamics. Unlike prior static fusion methods, FBI establishes a causal link between tactile signals and object motion via a dynamics-aware latent model. FBI employs a transformer- based interaction module to fuse flow-derived tactile features with visual inputs, training a one-step diffusion policy for real- time execution. Extensive experiments demonstrate that the proposed method outperforms the baseline methods in both simulation and the real world on two customized in-hand manipulation tasks and three standard dexterous manipulation tasks. Code, models, and more results are available on the website https://sites.google.com/view/dex-fbi.