Unsupervised Domain Adaptation for Robust Imitation Learning under Visual Perturbations
Yasuhiro Kato, Thomas Westfechtel, Jen-Yen Chang, Naoki Morihira, Akinobu Hayashi, Tatsuya Harada, Takayuki Osa
AI summary
Problem
Vision-based imitation learning policies degrade under visual domain shifts like lighting and background changes, and standard data augmentation alone fails to ensure robustness or adaptation due to reliance on fixed demonstration datasets.
Approach
The method uses adversarial feature learning to extract augmentation-invariant representations, followed by unsupervised domain adaptation that fine-tunes the policy using only initial target-domain images via a domain discriminator.
Key results
- Preserves source performance while enhancing resilience to lighting, background, and distractor shifts
- Adapts to new domains using only initial unlabeled images, eliminating costly demonstration collection
- Maximizes task-relevant information by minimizing mutual information with augmentation and domain indicators
- Validated across MuJoCo simulations and real-world bimanual robotic manipulation tasks
Why it matters
Enables cost-effective deployment of vision-based robot policies in dynamic real-world settings without requiring expensive re-demonstration data collection.
Abstract
Vision-based robot manipulation systems often suffer from performance degradation under domain shifts in visual inputs. While data augmentation is commonly employed in reinforcement learning, its application in imitation learning remains relatively underexplored. Our preliminary experiments indicate that simply incorporating augmentation techniques does not yield effective improvements in imitation learning. To address this challenge, we propose a two-stage learning process. First, we develop an adversarial feature learning framework that leverages data augmentation to enhance robustness against domain shifts. Second, we introduce an unsupervised domain adaptation method that adapts models to target environments using only easily collected image data. In robotic tasks, visual domain shifts can often be detected from initial observations alone. Since collecting complete action-labeled episodes in new domains is expensive, adapting with only initial images greatly reduces data collection costs. To this end, we develop an adaptation strategy that relies solely on initial target-domain observations, eliminating the need for labeled demonstrations. Experimental results across both simulation and physical robot implementations demonstrate that our method preserves source domain performance while exhibiting enhanced resilience to visual perturbations, including varying lighting conditions, background modifications, and environmental distractors.