TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Objects in Contact-Rich Scenes
Wen Yang, Zhixian Xie, Yiting Wang, Vamsi Sai Abhijit Tadepalli, Heni Ben Amor, Shan Lin, Wanxin Jin
AI summary
Problem
Pure vision-based tracking fails under heavy occlusions and motion blur in contact-rich scenes. The paper addresses the challenge of robustly estimating the 6-DoF pose of unseen, dynamic objects when visual observations are frequently degraded by contact interactions.
Approach
The framework uses a dual-loop architecture: Real2Sim learns object geometry and physical properties by aligning visual reconstructions with contact dynamics, while Sim2Real adaptively fuses visual tracking with physics simulation predictions to output robust poses.
Key results
- Real2Sim jointly estimates geometry and physical parameters via contact dynamics alignment
- Sim2Real adaptively fuses visual tracking with physics simulation for robust pose estimation
- Achieves >20 Hz real-time tracking on a GPU-accelerated MJX engine
- Significantly outperforms state-of-the-art visual-only trackers in falling and in-hand manipulation scenarios
Why it matters
Enables reliable perception for robotic manipulation and scene understanding in highly dynamic, occluded environments where pure vision fails.
Abstract
Real-time tracking of previously unseen, highly dynamic objects in contact-rich scenes, such as during dexterous in-hand manipulation, remains a major challenge. Pure vision- based approaches often fail under heavy occlusions due to frequent contact interactions and motion blur caused by abrupt impacts. We propose TwinTrack, a physics-aware perception system that enables robust, real-time 6-DoF pose tracking of unknown dynamic objects in contact-rich scenes by leveraging contact physics cues. At its core, TwinTrack integrates Real2Sim and Sim2Real. Real2Sim combines vision and contact physics to jointly estimate object geometry and physical properties: an initial reconstruction is obtained from vision, then refined by learning a geometry residual and simultaneously estimating physical parameters (e.g., mass, inertia, and friction) based on contact dynamics consistency. Sim2Real achieves robust pose estimation by adaptively fusing a visual tracker with predictions from the updated contact dynamics. TwinTrack is implemented on a GPU-accelerated, customized MJX engine to guarantee real- time performance. We evaluate our method on two contact-rich scenarios: object falling with environmental contacts and multi- fingered in-hand manipulation. Results show that, compared to baselines, TwinTrack delivers significantly more robust, accurate, and real-time tracking in these challenging settings, with tracking speeds above 20 Hz. Project page