Physics-Informed Machine Learning for Efficient Sim-To-Real Data Augmentation in Micro-Object Pose Estimation
Tan Zongcai, Lan Wei, Dandan Zhang
AI summary
Problem
Acquiring large, high-quality labeled datasets for optical microrobot pose estimation is costly and difficult, while existing simulation methods either lack physical accuracy or computational efficiency.
Approach
A physics-informed deep generative framework that combines wave optics-based rendering with a PixelGAN to simulate realistic microscope images and align them with real experimental data for efficient data augmentation.
Key results
- Improved SSIM by 35.6% over purely AI-driven methods while maintaining real-time rendering speeds (0.022 s/frame)
- Pose estimator trained on synthetic data achieved 93.9%/91.9% pitch/roll accuracy, closely matching real-data-trained models
- Framework successfully generalizes to unseen microrobot poses without additional training
- Achieved high-fidelity simulation of complex optical effects like diffraction rings and depth-dependent blur
Why it matters
Enables cost-effective, scalable training data generation for microrobot perception, advancing autonomous biomedical manipulation and micro-scale tracking applications.
Abstract
Precise pose estimation of optical microrobots is essential for enabling high-precision object tracking and autonomous biological studies. However, current methods rely heavily on large, high-quality microscope image datasets, which are difficult and costly to acquire due to the complexity of microrobot fabrication and the labour-intensive labelling. Digital twin systems offer a promising path for sim-to-real data augmentation, yet existing techniques struggle to replicate complex optical microscopy phenomena, such as diffraction artifacts and depth-dependent imaging. This work proposes a novel physics-informed deep generative learning framework that, for the first time, integrates wave optics-based physical rendering and depth alignment into a generative adversarial network (GAN), to synthesise high-fidelity microscope images for microrobot pose estimation efficiently. Our method improves the structural similarity index (SSIM) by 35.6% compared to purely AI-driven methods, while maintaining real-time render- ing speeds (0.022 s/frame). The pose estimator (CNN backbone) trained on our synthetic data achieves 93.9%/91.9% (pitch/roll) accuracy, just 5.0%/5.4% (pitch/roll) below that of an estimator trained exclusively on real data. Furthermore, our framework generalises to unseen poses, enabling data augmentation and robust pose estimation for novel microrobot configurations without additional training data.