MoE-Powered Fast VLMs Via Curriculum Learning-Based Knowledge Distillation: Taming Regular and Corner Cases in Autonomous Driving
Xue Zhao, Zhou Fang
AI summary
Problem
Large Vision-Language Models for autonomous driving suffer from high latency, and simply shrinking them degrades their ability to handle both regular and rare corner cases effectively.
Approach
The authors propose a Curriculum Learning-based Knowledge Distillation framework that combines a Mixture-of-Experts architecture with a two-granularity hardness mining strategy and a progressive release distillation schedule to balance efficiency and accuracy.
Key results
- Twofold increase in inference speed over existing approaches
- Maintains comparable performance on regular and corner cases
- MoE architecture preserves small model expressiveness
- H2G strategy adaptively mines hard tokens and samples
Why it matters
Enables real-time, resource-efficient deployment of autonomous driving systems without compromising safety or decision-making accuracy.
Abstract
Autonomous driving has advanced significantly with the integration of large Vision-Language Models (VLMs), which excel in understanding and analyzing driving data. However, existing VLMs face challenges, particularly in terms of latency, which is crucial for real-time driving tasks. While shrinking the model size can reduce latency, it also limits the model’s ability to handle both regular and corner cases effectively. To address this challenge, we propose the Curriculum Learning-based Knowledge Distillation (CLKD) framework. CLKD enhances student model performance through three key innovations: (1) integration of a Mixture-of-Experts (MoE) architecture to preserve model expressiveness; (2) Hardness- explored at Two Granularities (H2G), which dynamically identi- fies easy and difficult samples at both instance and feature levels; and (3) Progressive Release Distillation strategy that gradually reduces reliance on the teacher model, thereby fostering the student’s autonomy and improving its generalization capability in complex driving scenarios. In real-world data experiments, CLKD has achieved a twofold increase in speed compared to existing approaches while maintaining comparable performance.