← Back ICRA 2026

Unsupervised Domain Adaptation for Robust Imitation Learning under Visual Perturbations

Yasuhiro Kato, Thomas Westfechtel, Jen-Yen Chang, Naoki Morihira, Akinobu Hayashi, Tatsuya Harada, Takayuki Osa

PDF

AI summary

Key figure (auto-extracted from paper)

A two-stage adversarial framework enables imitation learning policies to adapt to new visual environments using only initial images, significantly boosting robustness to lighting and background shifts without labeled demonstrations.

Imitation learning Unsupervised domain adaptation Adversarial feature learning Visual robustness Robot manipulation

Problem

Vision-based imitation learning policies degrade under visual domain shifts like lighting and background changes, and standard data augmentation alone fails to ensure robustness or adaptation due to reliance on fixed demonstration datasets.

Approach

The method uses adversarial feature learning to extract augmentation-invariant representations, followed by unsupervised domain adaptation that fine-tunes the policy using only initial target-domain images via a domain discriminator.

Key results

Preserves source performance while enhancing resilience to lighting, background, and distractor shifts
Adapts to new domains using only initial unlabeled images, eliminating costly demonstration collection
Maximizes task-relevant information by minimizing mutual information with augmentation and domain indicators
Validated across MuJoCo simulations and real-world bimanual robotic manipulation tasks

Why it matters

Enables cost-effective deployment of vision-based robot policies in dynamic real-world settings without requiring expensive re-demonstration data collection.

Abstract

Vision-based robot manipulation systems often suffer from performance degradation under domain shifts in visual inputs. While data augmentation is commonly employed in reinforcement learning, its application in imitation learning remains relatively underexplored. Our preliminary experiments indicate that simply incorporating augmentation techniques does not yield effective improvements in imitation learning. To address this challenge, we propose a two-stage learning process. First, we develop an adversarial feature learning framework that leverages data augmentation to enhance robustness against domain shifts. Second, we introduce an unsupervised domain adaptation method that adapts models to target environments using only easily collected image data. In robotic tasks, visual domain shifts can often be detected from initial observations alone. Since collecting complete action-labeled episodes in new domains is expensive, adapting with only initial images greatly reduces data collection costs. To this end, we develop an adaptation strategy that relies solely on initial target-domain observations, eliminating the need for labeled demonstrations. Experimental results across both simulation and physical robot implementations demonstrate that our method preserves source domain performance while exhibiting enhanced resilience to visual perturbations, including varying lighting conditions, background modifications, and environmental distractors.

Index terms

Imitation Learning