Train Once, Apply Broadly: Low-Frequency Generative Augmentation for Driver Distraction Recognition under Photometric Shifts
DICHAO LIU, Longjiao Zhao, Mingkai Gu, HaoJiang Chen, Ying Ji
AI summary
Problem
Driver distraction recognition models degrade under real-world camera and lighting shifts, yet collecting labeled data for every deployment device is impractical. This paper addresses the need for robust single-source domain generalization without target-domain data.
Approach
LFGA splits images into fixed high-frequency structure and mutable low-frequency appearance, using multi-stage feature-conditioned generators to create challenging but semantically consistent training views. These generators are trained adversarially to enforce decision consistency and feature decorrelation, then discarded at inference.
Key results
- Improves cross-domain accuracy over strong SSDG baselines on synthetic photometric shifts
- Preserves in-domain classification accuracy
- Achieves strong zero-shot performance on real cross-device video data
- Adds negligible inference overhead (~3.7 ms per frame) since generators are training-only
Why it matters
Enables reliable, deployment-ready driver monitoring systems across diverse cameras and lighting without costly per-device data collection.
Abstract
Driver distraction recognition (DDR) degrades under deployment-time shifts in camera/ISP pipelines and illumination. We frame this as a single-source domain gen- eralization (SSDG) problem: training on one labeled source domain and testing on unseen devices and lighting. Motivated by this, we propose Low-Frequency Generative Augmentation (LFGA), which separates each image into a fixed high-frequency structure and a re-renderable low-frequency base. Multi-stage, feature-conditioned generators perturb only the photometric low-frequency content and recombine it with the original high- frequency structure to yield “hard-but-correct” views to teach the model photometric invariances. Training imposes decision consistency via cross-entropy and logit matching, and pro- motes stage-wise separation along class-agnostic factors with a feature-dissimilarity loss. Generators are training-only. On two DDR benchmarks with synthetic cross-photometric shifts and a zero-shot real cross-device video test, LFGA improves cross- domain performance over strong SSDG and DDR baselines while preserving in-domain accuracy.