← Back ICRA 2026

Train Once, Apply Broadly: Low-Frequency Generative Augmentation for Driver Distraction Recognition under Photometric Shifts

DICHAO LIU, Longjiao Zhao, Mingkai Gu, HaoJiang Chen, Ying Ji

PDF

AI summary

Key figure (auto-extracted from paper)

LFGA boosts cross-device and cross-lighting robustness in driver distraction recognition by generating photometrically shifted, low-frequency training views while preserving semantic structure.

Driver distraction recognition Single-source domain generalization Photometric shift Generative augmentation Low-frequency decomposition Domain robustness

Problem

Driver distraction recognition models degrade under real-world camera and lighting shifts, yet collecting labeled data for every deployment device is impractical. This paper addresses the need for robust single-source domain generalization without target-domain data.

Approach

LFGA splits images into fixed high-frequency structure and mutable low-frequency appearance, using multi-stage feature-conditioned generators to create challenging but semantically consistent training views. These generators are trained adversarially to enforce decision consistency and feature decorrelation, then discarded at inference.

Key results

Improves cross-domain accuracy over strong SSDG baselines on synthetic photometric shifts
Preserves in-domain classification accuracy
Achieves strong zero-shot performance on real cross-device video data
Adds negligible inference overhead (~3.7 ms per frame) since generators are training-only

Why it matters

Enables reliable, deployment-ready driver monitoring systems across diverse cameras and lighting without costly per-device data collection.

Abstract

Driver distraction recognition (DDR) degrades under deployment-time shifts in camera/ISP pipelines and illumination. We frame this as a single-source domain gen- eralization (SSDG) problem: training on one labeled source domain and testing on unseen devices and lighting. Motivated by this, we propose Low-Frequency Generative Augmentation (LFGA), which separates each image into a fixed high-frequency structure and a re-renderable low-frequency base. Multi-stage, feature-conditioned generators perturb only the photometric low-frequency content and recombine it with the original high- frequency structure to yield “hard-but-correct” views to teach the model photometric invariances. Training imposes decision consistency via cross-entropy and logit matching, and pro- motes stage-wise separation along class-agnostic factors with a feature-dissimilarity loss. Generators are training-only. On two DDR benchmarks with synthetic cross-photometric shifts and a zero-shot real cross-device video test, LFGA improves cross- domain performance over strong SSDG and DDR baselines while preserving in-domain accuracy.

Index terms

Deep Learning for Visual Perception Intelligent Transportation Systems Recognition