← Back ICRA 2026

ConTact: Contrastive Tactile Alignment for Sim-To-Real Robotic Manipulation

Yanlin Lai, YINZHAO DONG, Chun Yuan, Cheng Zhou

PDF

AI summary

Key figure (auto-extracted from paper)

A contrastive tactile alignment framework enables zero-shot sim-to-real transfer for contact-rich robotic manipulation without fine-tuning.

Tactile sensing Sim-to-real transfer Contrastive learning Robotic manipulation Deep reinforcement learning Zero-shot transfer

Problem

Tactile-driven robotic manipulation struggles with the computational cost of high-fidelity simulation and the difficulty of bridging the sim-to-real gap for tactile sensor data.

Approach

The authors propose a fast ray-casting tactile simulator for efficient GPU training and a contrastive learning framework that aligns simulated and real tactile features using a dedicated spatiotemporal encoder.

Key results

Fast ray-casting tactile simulator for GPU training
Contrastive alignment of simulated and real tactile features
Zero-shot sim-to-real transfer on tracking tasks
Spatiotemporal encoder captures dynamic contact features

Why it matters

Provides a computationally efficient and robust pathway for deploying tactile-based reinforcement learning policies directly onto physical robots in complex manipulation tasks.

Abstract

Deep reinforcement learning (DRL) has achieved remarkable success in robot control. However, DRL with tactile feedback still faces challenges in contact-rich tasks involving visual occlusion or high-speed dynamics. The challenges stem from two primary sources. First, the complexity and diversity of real-world tactile sensors make them difficult to simulate and transfer to reality. Second, existing high-fidelity simulators are often too computationally intensive for large-scale DRL, forcing a trade-off between accuracy and speed. To address this, we design a high-speed tactile simulation model for tactile arrays enabling efficient, large-scale DRL training on GPUs. We then propose the Contrastive Tactile (ConTact) framework, which leverages contrastive learning to align tactile features for sim- to-real transfer. ConTact employs a dedicated spatiotemporal encoder that explicitly models temporal changes to capture the dynamic features of contact events. We then validate it on two kinds of manipulation tasks, Single and Composite Object Tracking (SOT/COT), which rely solely on tactile information and proprioception. Moreover, policies trained with ConTact from simulation are directly deployed in the real world without finetuning, achieving zero-shot transfer.

Index terms

Reinforcement Learning Force and Tactile Sensing Machine Learning for Robot Control