From Simulation to Deployment: Curriculum-Based Domain Adaptation for Semantic Segmentation in Autonomous Forklifts
Christof Schützenhöfer, Patrick Rechberger, Thomas Ulz, Christian Steger
AI summary
Problem
Deploying semantic segmentation models for autonomous forklifts is hindered by visual variations across industrial sites, leading to poor cross-domain generalization and expensive re-annotation efforts.
Approach
The method progressively adapts a model by pretraining on synthetic data, fine-tuning on a labeled real-world source, and transferring to a new target domain using filtered pseudo-labels combined with anchor samples to prevent drift and handle class imbalance.
Key results
- Progressive sim-to-real-to-real training pipeline reduces annotation overhead
- mIoU increases from 67.37 to 71.36 under moderate domain shift
- mIoU increases from 49.57 to 57.22 under hard domain shift
- Anchor replay and class-aware filtering stabilize adaptation and mitigate class imbalance
Why it matters
It enables scalable, cost-effective deployment of robust visual perception systems for industrial robotics across diverse and changing warehouse environments.
Abstract
Deploying semantic segmentation models for au- tonomous forklifts in industrial environments is challenging because visual conditions vary across sites, leading to poor cross-domain generalization and costly re-annotation efforts. We propose a curriculum-based domain adaptation framework that progressively transfers a segmentation model from simu- lation to real-world industrial deployment. The model is first pretrained on synthetic datasets with increasing complexity, then fine-tuned on a labeled real source domain to reduce the sim-to-real gap and adapt to camera-specific characteristics. Finally, it is adapted to a new target domain using pseudo- label-based self-training. To reduce drift during target adapta- tion, pseudo-labeled target samples are combined with labeled samples from the source-real domain, while a replay buffer improves robustness to class imbalance by oversampling rare classes. Preliminary experiments with DDRNet demonstrate improved performance under both moderate and hard domain shifts, with mIoU gains from 67.37 to 71.36 and from 49.57 to 57.22, respectively. The results highlight the potential of progressive multi-domain adaptation for scalable industrial robotic perception. semantic segmentation, synthetic data, pseudo labeling