Research Analyzer
← Back ICRA 2026

TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Objects in Contact-Rich Scenes

Wen Yang, Zhixian Xie, Yiting Wang, Vamsi Sai Abhijit Tadepalli, Heni Ben Amor, Shan Lin, Wanxin Jin

PDF

AI summary

Key figure (auto-extracted from paper)
TwinTrack achieves robust, real-time 6-DoF tracking of unseen objects in highly occluded scenes by fusing visual cues with learned contact physics.
Physics-aware tracking Contact dynamics Real-time pose estimation Vision-robotics fusion Gaussian splatting MJX simulation

Problem

Pure vision-based tracking fails under heavy occlusions and motion blur in contact-rich scenes. The paper addresses the challenge of robustly estimating the 6-DoF pose of unseen, dynamic objects when visual observations are frequently degraded by contact interactions.

Approach

The framework uses a dual-loop architecture: Real2Sim learns object geometry and physical properties by aligning visual reconstructions with contact dynamics, while Sim2Real adaptively fuses visual tracking with physics simulation predictions to output robust poses.

Key results

  • Real2Sim jointly estimates geometry and physical parameters via contact dynamics alignment
  • Sim2Real adaptively fuses visual tracking with physics simulation for robust pose estimation
  • Achieves >20 Hz real-time tracking on a GPU-accelerated MJX engine
  • Significantly outperforms state-of-the-art visual-only trackers in falling and in-hand manipulation scenarios

Why it matters

Enables reliable perception for robotic manipulation and scene understanding in highly dynamic, occluded environments where pure vision fails.

Abstract

Real-time tracking of previously unseen, highly dynamic objects in contact-rich scenes, such as during dexterous in-hand manipulation, remains a major challenge. Pure vision- based approaches often fail under heavy occlusions due to frequent contact interactions and motion blur caused by abrupt impacts. We propose TwinTrack, a physics-aware perception system that enables robust, real-time 6-DoF pose tracking of unknown dynamic objects in contact-rich scenes by leveraging contact physics cues. At its core, TwinTrack integrates Real2Sim and Sim2Real. Real2Sim combines vision and contact physics to jointly estimate object geometry and physical properties: an initial reconstruction is obtained from vision, then refined by learning a geometry residual and simultaneously estimating physical parameters (e.g., mass, inertia, and friction) based on contact dynamics consistency. Sim2Real achieves robust pose estimation by adaptively fusing a visual tracker with predictions from the updated contact dynamics. TwinTrack is implemented on a GPU-accelerated, customized MJX engine to guarantee real- time performance. We evaluate our method on two contact-rich scenarios: object falling with environmental contacts and multi- fingered in-hand manipulation. Results show that, compared to baselines, TwinTrack delivers significantly more robust, accurate, and real-time tracking in these challenging settings, with tracking speeds above 20 Hz. Project page

Index terms

Perception for Grasping and Manipulation Visual Tracking Contact Modeling

Related papers