ManipForce: Force-Guided Policy Learning with Frequency-Aware Representation for Contact-Rich Manipulation
Geonhyup Lee, Youngjin Lee, Kangmin Kim, Seongju Lee, Sangjun Noh, Seunghyeok Back, Kyoobin Lee
AI summary
Problem
Existing imitation learning methods for contact-rich manipulation rely on vision-only demonstrations, missing critical force cues, while collecting high-fidelity multimodal data remains difficult and costly.
Approach
The authors introduce a handheld system to capture natural human demonstrations with synchronized RGB and high-frequency force-torque signals, and propose the Frequency-Aware Multimodal Transformer (FMT) to fuse these asynchronous inputs using frequency-aware embeddings and cross-attention within a diffusion policy.
Key results
- 83% average success rate across six real-world contact-rich tasks
- Substantial performance gains over RGB-only baselines (22% average success)
- High-frequency F/T sensing and bi-directional cross-attention proven critical for precise contact control
- Direct real-world transfer of human demonstrations with sub-millimeter pose accuracy and gravity-compensated force measurements
Why it matters
Provides a practical, open-source framework for capturing and learning from critical haptic feedback, advancing robust robotic assembly and precision manipulation.
Abstract
Contact-rich manipulation tasks such as precision assembly require precise control of interaction forces, yet existing imitation learning methods rely mainly on vision- only demonstrations. We propose ManipForce, a handheld system designed to capture high-frequency force–torque (F/T) and RGB data during natural human demonstrations for contact-rich manipulation. Building on these demonstrations, we introduce the Frequency-Aware Multimodal Transformer (FMT). FMT encodes asynchronous RGB and F/T signals using frequency- and modality-aware embeddings and fuses them via bi-directional cross-attention within a transformer diffu- sion policy. Through extensive experiments on six real-world contact-rich manipulation tasks—such as gear assembly, box flipping, and battery insertion—FMT trained on ManipForce demonstrations achieves robust performance with an average success rate of 83% across all tasks, substantially outperforming RGB-only baselines. Ablation and sampling-frequency analyses further confirm that incorporating high-frequency F/T data and cross-modal integration improves policy performance, es- pecially in tasks demanding high precision and stable contact. Hardware, software, and video demos are available at: https: //sites.google.com/view/manipforce/. 1 Department of AI Convergence, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea. 2 Department of AI Machinery, Korea Institute of Machinery & Materials (KIMM), Daejeon 34103, Republic of Korea. † Corresponding author: Kyoobin Lee kyoobinlee@gist.ac.kr