Spatially-Anchored Tactile Awareness for Robust Dexterous Manipulation
Jialei Huang, Yang Ye, Yuanqing Gong, Xuezhou Zhu, Yang Gao, Kaifeng Zhang
AI summary
Problem
Current visuo-tactile learning methods fail to leverage the spatial relationship between tactile signals and hand kinematics, causing poor geometric reasoning and task failure in contact-rich, high-precision scenarios.
Approach
SaTA anchors raw tactile features to the hand’s kinematic coordinate system using forward kinematics and Fourier encoding, then fuses this spatial context with tactile data to directly predict manipulation actions without object models.
Key results
- Up to 30% higher success rates and 27% faster completion than baselines
- Successful sub-millimeter precision on USB-C mating and bulb installation
- First-contact success rate doubles to 48.3%, proving precise initial alignment
- Ablation confirms spatial anchoring drives all major performance gains
Why it matters
Provides a general, model-free framework for learning high-precision dexterous manipulation, enabling robots to perform delicate assembly and alignment tasks previously reserved for traditional model-based control.
Abstract
Dexterous manipulation requires precise geomet- ric reasoning, yet existing visuo-tactile learning methods strug- gle with sub-millimeter precision tasks that are routine for traditional model-based approaches. We identify a key limi- tation: while tactile sensors provide rich contact information, current learning frameworks fail to effectively leverage both the perceptual richness of tactile signals and their spatial relationship with hand kinematics. We believe an ideal tactile representation should explicitly ground contact measurements in a stable reference frame while preserving detailed sensory information—enabling policies to not only detect contact oc- currence but also precisely infer object geometry in the hand’s coordinate system. We introduce SaTA (Spatially-anchored Tactile Awareness for dexterous manipulation), an end-to-end policy framework that explicitly anchors tactile features to the hand’s kinematic frame through forward kinematics, enabling accurate geometric reasoning without requiring object models or explicit pose estimation. Our key insight is that spatially- grounded tactile representations allow policies to not only detect contact occurrence but also precisely infer object geometry in the hand’s coordinate system. We validate SaTA on challenging dexterous manipulation tasks, including bimanual USB-C mat- ing in free space—a task demanding sub-millimeter alignment precision—as well as light bulb installation requiring precise thread engagement and rotational control, and card sliding that demands delicate force modulation and angular precision. These tasks represent significant challenges for learning-based methods due to their stringent precision requirements. Across multiple benchmarks, SaTA significantly outperforms strong visuo-tactile baselines, improving success rates by up to 30% while reducing task completion times by 27%.