ConTact: Contrastive Tactile Alignment for Sim-To-Real Robotic Manipulation
Yanlin Lai, YINZHAO DONG, Chun Yuan, Cheng Zhou
AI summary
Problem
Tactile-driven robotic manipulation struggles with the computational cost of high-fidelity simulation and the difficulty of bridging the sim-to-real gap for tactile sensor data.
Approach
The authors propose a fast ray-casting tactile simulator for efficient GPU training and a contrastive learning framework that aligns simulated and real tactile features using a dedicated spatiotemporal encoder.
Key results
- Fast ray-casting tactile simulator for GPU training
- Contrastive alignment of simulated and real tactile features
- Zero-shot sim-to-real transfer on tracking tasks
- Spatiotemporal encoder captures dynamic contact features
Why it matters
Provides a computationally efficient and robust pathway for deploying tactile-based reinforcement learning policies directly onto physical robots in complex manipulation tasks.
Abstract
Deep reinforcement learning (DRL) has achieved remarkable success in robot control. However, DRL with tactile feedback still faces challenges in contact-rich tasks involving visual occlusion or high-speed dynamics. The challenges stem from two primary sources. First, the complexity and diversity of real-world tactile sensors make them difficult to simulate and transfer to reality. Second, existing high-fidelity simulators are often too computationally intensive for large-scale DRL, forcing a trade-off between accuracy and speed. To address this, we design a high-speed tactile simulation model for tactile arrays enabling efficient, large-scale DRL training on GPUs. We then propose the Contrastive Tactile (ConTact) framework, which leverages contrastive learning to align tactile features for sim- to-real transfer. ConTact employs a dedicated spatiotemporal encoder that explicitly models temporal changes to capture the dynamic features of contact events. We then validate it on two kinds of manipulation tasks, Single and Composite Object Tracking (SOT/COT), which rely solely on tactile information and proprioception. Moreover, policies trained with ConTact from simulation are directly deployed in the real world without finetuning, achieving zero-shot transfer.