← Back ICRA 2026

Physics-Informed Machine Learning for Efficient Sim-To-Real Data Augmentation in Micro-Object Pose Estimation

Tan Zongcai, Lan Wei, Dandan Zhang

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating wave optics physics into a GAN framework significantly improves synthetic microscope image fidelity and enables accurate, real-time sim-to-real data augmentation for microrobot pose estimation.

Physics-informed machine learning Sim-to-real augmentation Microrobot pose estimation Wave optics simulation PixelGAN Digital twin

Problem

Acquiring large, high-quality labeled datasets for optical microrobot pose estimation is costly and difficult, while existing simulation methods either lack physical accuracy or computational efficiency.

Approach

A physics-informed deep generative framework that combines wave optics-based rendering with a PixelGAN to simulate realistic microscope images and align them with real experimental data for efficient data augmentation.

Key results

Improved SSIM by 35.6% over purely AI-driven methods while maintaining real-time rendering speeds (0.022 s/frame)
Pose estimator trained on synthetic data achieved 93.9%/91.9% pitch/roll accuracy, closely matching real-data-trained models
Framework successfully generalizes to unseen microrobot poses without additional training
Achieved high-fidelity simulation of complex optical effects like diffraction rings and depth-dependent blur

Why it matters

Enables cost-effective, scalable training data generation for microrobot perception, advancing autonomous biomedical manipulation and micro-scale tracking applications.

Abstract

Precise pose estimation of optical microrobots is essential for enabling high-precision object tracking and autonomous biological studies. However, current methods rely heavily on large, high-quality microscope image datasets, which are difficult and costly to acquire due to the complexity of microrobot fabrication and the labour-intensive labelling. Digital twin systems offer a promising path for sim-to-real data augmentation, yet existing techniques struggle to replicate complex optical microscopy phenomena, such as diffraction artifacts and depth-dependent imaging. This work proposes a novel physics-informed deep generative learning framework that, for the first time, integrates wave optics-based physical rendering and depth alignment into a generative adversarial network (GAN), to synthesise high-fidelity microscope images for microrobot pose estimation efficiently. Our method improves the structural similarity index (SSIM) by 35.6% compared to purely AI-driven methods, while maintaining real-time render- ing speeds (0.022 s/frame). The pose estimator (CNN backbone) trained on our synthetic data achieves 93.9%/91.9% (pitch/roll) accuracy, just 5.0%/5.4% (pitch/roll) below that of an estimator trained exclusively on real data. Furthermore, our framework generalises to unseen poses, enabling data augmentation and robust pose estimation for novel microrobot configurations without additional training data.

Index terms

Micro/Nano Robots Computer Vision for Medical Robotics Deep Learning for Visual Perception