Research Analyzer
← Back ICRA 2026

TactileAloha: Learning Bimanual Manipulation with Tactile Sensing

Ningquan Gu, Kazuhiro Kosuge, Mitsuhiro Hayashibe

PDF

AI summary

Key figure (auto-extracted from paper)
Integrating a GelSight tactile sensor with an improved transformer-based policy enables robots to successfully perform texture-dependent bimanual manipulation tasks that vision-only systems fail at, achieving an ~11% relative improvement over state-of-the-art methods.
Tactile sensing bimanual manipulation imitation learning GelSight transformer policy robotics

Problem

Camera vision alone cannot reliably distinguish fine object textures or orientations required for precise contact, while existing tactile approaches often focus on force rather than texture or rely on limited heuristic pipelines.

Approach

The system mounts a GelSight sensor on the Aloha gripper to capture texture data, encodes it with a pre-trained ResNet, fuses it with visual and proprioceptive inputs, and trains a transformer-based policy using a weighted loss function and improved temporal ensembling for deployment.

Key results

  • Successful execution of texture-dependent zip tie insertion and Velcro fastening tasks
  • ~11.0% average relative improvement in success rate over state-of-the-art tactile methods
  • Effective multimodal fusion of tactile, visual, and proprioceptive features via ResNet and ACT policy
  • Enhanced action precision through exponentially decaying loss weighting and temporal proximity ensembling

Why it matters

Provides an open-source, hardware-software framework demonstrating that tactile sensing is essential for texture-sensitive manipulation, benefiting roboticists developing dexterous and robust manipulation systems.

Abstract

Tactile texture is vital for robotic manipulation but challenging for camera vision-based observation. To address this, weproposeTactileAloha,anintegratedtactile-visionroboticsystem built upon Aloha, with a tactile sensor mounted on the gripper to capture fine-grained texture information and support real-time visualization during teleoperation, facilitating efficient data collec- tion and manipulation. Using data collected from our integrated system, we encode tactile signals with a pre-trained ResNet and fuse them with visual and proprioceptive features. The combined ob- servations are processed by a transformer-based policy with action chunking to predict future actions. We use a weighted loss function during training to emphasize near-future actions, and employ an improved temporal aggregation scheme at deployment to enhance action precision. Experimentally, we introduce two bimanual tasks: zip tie insertion and Velcro fastening, both requiring tactile sensing to perceive the object texture and align two object orientations by two hands. Our proposed method adaptively changes the gen- erated manipulation sequence itself based on tactile sensing in a systematic manner. Results show that our system, leveraging tactile information, can handle texture-related tasks that camera vision- based methods fail to address. Moreover, our method achieves an average relative improvement of approximately 11.0% compared to state-of-the-art method with tactile input, demonstrating its performance.

Index terms

Imitation Learning Bimanual Manipulation Hardware-Software Integration in Robotics

Related papers