Temporal Transfer Learning for Traffic Optimization with Coarse-Grained Advisory Autonomy
Jung-Hoon Cho, Sirui Li, Jeongyun Kim, Cathy Wu
AI summary
Problem
Direct deep reinforcement learning fails to generalize across different advisory hold durations due to training brittleness, limiting the practical deployment of coarse-grained driving advisories for human drivers.
Approach
The authors introduce Temporal Transfer Learning (TTL) algorithms that select optimal source training tasks based on temporal similarities, enabling zero-shot policy transfer across a full range of hold durations without fine-tuning.
Key results
- Proposes greedy and coarse-to-fine temporal transfer learning algorithms
- Achieves reliable zero-shot generalization across hold durations from 0.1 to 40 seconds
- Outperforms exhaustive and multitask reinforcement learning baselines in mixed-traffic simulations
- Validates coarse-grained advisory autonomy as a viable near-term traffic optimization strategy
Why it matters
Offers a robust, data-efficient pathway to deploy real-time driving advisories that improve urban traffic flow without requiring full vehicle automation.
Abstract
The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic, maximizing vehicle speed and throughput. This article explores advisory autonomy, in which real-time driving advisories are issued to human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent studies of coordinating CAVs have leveraged deep reinforcement learning (RL). Coarse-grained advisory is formal- ized as zero-order holds, and we consider a range of hold durations from 0.1 to 40 s. However, despite the similarity of the higher frequency tasks for CAVs, a direct application of deep RL fails to generalize to advisory autonomy tasks. To overcome this, we employ zero-shot transfer, training policies on a set of source tasks—specific traffic scenarios with designated hold durations— and then evaluating the efficacy of these policies on different target tasks.Weintroducetemporaltransferlearning(TTL)algorithmsto select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This article underscores the potential of coarse- grained advisory autonomy with TTL in traffic flow optimization.