Language-Guided Attribute Alignment and Semantic Consistency for Zero-Shot Domain Adaptation
Junhong Pan, Chenyi Jiang, Minxian Li, Haofeng Zhang
AI summary
Problem
Existing zero-shot domain adaptation methods rely on rigid, fixed prompts that miss fine-grained domain-specific attributes and suffer from unstable visual-linguistic alignment, limiting cross-domain transfer.
Approach
The authors propose LASC, which dynamically combines category labels with domain-relevant attributes to create adaptive text prompts, aligns them with visual features through contrastive learning, and stabilizes representations using a memory bank that enforces intra-class compactness and inter-class separation.
Key results
- Dynamic attribute-driven prompt generation captures fine-grained domain variations
- Memory-based consistency constraint enforces intra-class compactness and inter-class separation
- Significant performance gains over state-of-the-art baselines on multiple cross-domain benchmarks
- Robust zero-shot adaptation achieved without requiring any target-domain supervision
Why it matters
Enables reliable cross-domain visual understanding for safety-critical applications like autonomous driving and medical imaging where labeled target data is impractical to collect.
Abstract
In cross-domain visual understanding tasks, mod- els often achieve strong performance on the source domain but suffer severe degradation when applied to target domains with substantial distribution shifts. This challenge is particularly prominent under the zero-shot domain adaptation setting, where adaptation must be achieved without access to target- domain samples and instead relies on language guidance to bridge the gap. However, existing approaches typically de- pend on fixed class names or handcrafted prompt templates, which fail to capture fine-grained semantic attributes present in the target domain. Moreover, the insufficient alignment between visual and linguistic modalities further constrains the transferability of semantic knowledge. To address these issues, we propose an attribute-driven cross-modal feature modula- tion framework, termed Language-guided Attribute alignment and Semantic Consistency (LASC). On the semantic side, we introduce an attribute-driven prompt generation module that dynamically combines category information with domain- relevant attributes to construct adaptive text prompts, which are aligned with visual features through cross-modal attention for enhanced semantic stability. Furthermore, we incorporate a semantic consistency constraint, where a memory bank enforces intra-class compactness and inter-class separation, ensuring robust discriminability across domains. Extensive experiments demonstrate that our approach achieves significant improve- ments over state-of-the-art baselines on multiple cross-domain benchmarks, and maintains strong adaptation ability without requiring any target-domain data. The code is available at https://github.com/JHP-3/LASC.