Research Analyzer
← Back ICRA 2026

Vi-TacMan: Articulated Object Manipulation Via Vision and Touch

Leiyao Cui, Zihang Zhao, Sirui Xie, Wenhuan Zhang, Zhi Han, Yixin Zhu

PDF

AI summary

Key figure (auto-extracted from paper)
Coarse visual cues combined with tactile contact regulation enable reliable, model-free manipulation of unseen articulated objects.
articulated manipulation vision-touch synergy tactile control kinematic-invariant robotic grasping embodied AI

Problem

Vision-based methods yield imprecise kinematic estimates on unfamiliar objects, while tactile approaches require accurate initialization, leaving a gap in robust, generalized articulated manipulation.

Approach

Vi-TacMan uses vision to detect interaction parts and estimate coarse motion directions, which initialize a tactile controller that refines execution through real-time contact regulation.

Key results

  • 0.86 mAP detection of movable and holdable parts
  • Surface normals as geometric priors significantly reduce direction estimation error
  • von Mises-Fisher distributions model directional uncertainty under ambiguity
  • Successful manipulation across 50,000+ simulations and diverse real-world objects without kinematic models

Why it matters

Enables household robots to reliably manipulate diverse, unseen articulated objects in unstructured environments without requiring precise prior modeling.

Abstract

Autonomous manipulation of articulated objects remains a fundamental challenge for robots in human envi- ronments. Vision-based methods can infer hidden kinematics but can yield imprecise estimates on unfamiliar objects. Tactile approaches achieve robust control through contact feedback but require accurate initialization. This suggests a natural synergy: vision for global guidance, touch for local precision. Yet no framework systematically exploits this complementarity for generalized articulated manipulation. Here we present Vi- TacMan, which uses vision to propose grasps and coarse directions that seed a tactile controller for precise execution. By incorporating surface normals as geometric priors and modeling directions via von Mises-Fisher (vMF) distributions, our approach achieves significant gains over baselines (all p<0.0001). Critically, manipulation succeeds without explicit kinematic models—the tactile controller refines coarse visual estimates through real-time contact regulation. Tests on more than 50,000 simulated and diverse real-world objects confirm robust cross-category generalization. This work establishes that coarse visual cues suffice for reliable manipulation when coupled with tactile feedback, offering a scalable paradigm for autonomous systems in unstructured environments.

Index terms

Force and Tactile Sensing Soft Sensors and Actuators Sensor-based Control

Related papers