Automatic Physically-Based Sim2Real for Tactile Images through Differentiable Path-Tracing Rendering
Guillaume Duret, Anna Samsonenko, Florence ZARA, Jan Peters, Liming Chen
AI summary
Problem
High-fidelity simulation of vision-based tactile sensors suffers from a persistent sim-to-real gap due to unmodeled optical effects like glass refraction and the need for manual tuning of physical parameters such as camera pose and lighting.
Approach
The authors develop a fully differentiable rendering pipeline that optimizes critical simulation parameters directly from minimal real-world images, and pair it with a fast image-to-image translation model trained on NOCS maps to enable rapid, high-fidelity inference.
Key results
- First fully differentiable rendering pipeline for visual tactile sensors
- State-of-the-art sim-to-real accuracy on multi-axis deformation benchmarks
- Novel inverse rendering application for single-image mesh reconstruction
- Near real-time inference via NOCS-based image-to-image translation
Why it matters
Provides a scalable, automated solution for generating photorealistic tactile data, accelerating the development of data-driven robotic manipulation and tactile perception algorithms.
Abstract
High-fidelity simulation of vision-based tactile sen- sors is essential for developing data-driven robotic manipulation algorithms. However, a significant sim-to-real gap persists due to the difficulty in modeling complex optical effects, such as refraction through protective glass layers, and in accurately estimating physical parameters like sensor pose and lighting. To bridge this gap, we introduce a novel, fully differentiable pipeline for visual tactile simulation. Leveraging a differentiable path tracer, our method optimizes critical parameters—including camera pose, lighting conditions, and object texture—directly from just three real images. This approach achieves highly realistic simulations with physically accurate light transport and glass refraction. We validate our method through a comprehensive benchmark against real-world data, demonstrating state-of-the-art sim-to-real accuracy. We also enable novel applications, such as mesh reconstruction from a single tactile image via inverse rendering. To overcome the computational cost of path tracing, we further use a image-to-image translation model. This model uses high-fidelity simulated data alongside Normalized Object Coordinate Space (NOCS) maps as input, preserving crucial deformation infor- mation while enabling rapid inference. The code is available on https://tacdiffrend.github.io/