Task-Aware and Structure-Knowledge-Guided Quantization for End-To-End YOLO Object Detection
MingHua Zhu, Liangwei Li, Shunan Zhou, Jingfei Jiang, Jinwei Xu
AI summary
Problem
Existing post-training quantization methods, optimized for classification, cause severe performance drops when applied to YOLO object detectors due to high regression sensitivity, long-tailed activation distributions, and the complex, heterogeneous detection head.
Approach
TASKQ combines a sparse quantization strategy to handle skewed activations, a detection-aware regularization loss based on IoU to guide fine-tuning, and a head-wise quantization scheme aligned with the detector's multi-scale architecture.
Key results
- Identifies three core YOLO quantization challenges: task sensitivity, long-tail activations, and head heterogeneity
- Introduces sparse quantization to preserve precision for skewed activation distributions
- Develops IoU-aware task regularization to guide quantization parameter optimization
- Achieves state-of-the-art post-training quantization performance across YOLO variants in low-bit regimes
Why it matters
Enables accurate, low-bit deployment of real-time object detection models on resource-constrained edge devices.
Abstract
The YOLO series of models are pivotal for real-time object detection, yet their deployment on resource- constrained edge devices necessitates effective model compres- sion. Post-Training Quantization (PTQ) offers a promising, low-cost solution, but existing methods, primarily designed for classification tasks, often lead to significant performance degradation when applied to YOLO models. In this paper, we systematically analyze the key challenges in quantizing YOLO architectures. We identify three primary obstacles: (1) the high sensitivity of detection tasks to quantization errors, exacerbated by the non-linear IoU metric; (2) the pronounced long-tail distribution of activations, particularly with the SiLU function, which complicates low-bit quantization; and (3) the structural heterogeneity of the multi-scale, multi-task detec- tion head, which renders conventional block-wise quantization strategies ineffective. To address these challenges, we propose a novel framework, Task-Aware and Structure-Knowledge-guided Quantization (TASKQ). Our framework introduces three key components: a sparse quantization strategy to mitigate the impact of long-tailed activations, a Detection-aware Task Reg- ularization (DTR) mechanism that incorporates IoU-based loss to guide parameter fine-tuning, and a Scale-and-Task- Aware Head-wise Quantization (STAHQ) scheme that aligns quantization granularity with the head’s functional structure. Extensive experiments on various YOLO models demonstrate that TASKQ significantly outperforms existing PTQ methods, especially in low-bit scenarios, establishing a new state-of-the- art for end-to-end YOLO quantization.