← Back ICRA 2026

Task-Aware and Structure-Knowledge-Guided Quantization for End-To-End YOLO Object Detection

MingHua Zhu, Liangwei Li, Shunan Zhou, Jingfei Jiang, Jinwei Xu

PDF

AI summary

Key figure (auto-extracted from paper)

TASKQ establishes a new state-of-the-art for low-bit post-training quantization of YOLO models by directly addressing their unique task sensitivity and structural heterogeneity.

Post-Training Quantization YOLO Object Detection Model Compression Edge Deployment Sparse Quantization

Problem

Existing post-training quantization methods, optimized for classification, cause severe performance drops when applied to YOLO object detectors due to high regression sensitivity, long-tailed activation distributions, and the complex, heterogeneous detection head.

Approach

TASKQ combines a sparse quantization strategy to handle skewed activations, a detection-aware regularization loss based on IoU to guide fine-tuning, and a head-wise quantization scheme aligned with the detector's multi-scale architecture.

Key results

Identifies three core YOLO quantization challenges: task sensitivity, long-tail activations, and head heterogeneity
Introduces sparse quantization to preserve precision for skewed activation distributions
Develops IoU-aware task regularization to guide quantization parameter optimization
Achieves state-of-the-art post-training quantization performance across YOLO variants in low-bit regimes

Why it matters

Enables accurate, low-bit deployment of real-time object detection models on resource-constrained edge devices.

Abstract

The YOLO series of models are pivotal for real-time object detection, yet their deployment on resource- constrained edge devices necessitates effective model compres- sion. Post-Training Quantization (PTQ) offers a promising, low-cost solution, but existing methods, primarily designed for classification tasks, often lead to significant performance degradation when applied to YOLO models. In this paper, we systematically analyze the key challenges in quantizing YOLO architectures. We identify three primary obstacles: (1) the high sensitivity of detection tasks to quantization errors, exacerbated by the non-linear IoU metric; (2) the pronounced long-tail distribution of activations, particularly with the SiLU function, which complicates low-bit quantization; and (3) the structural heterogeneity of the multi-scale, multi-task detec- tion head, which renders conventional block-wise quantization strategies ineffective. To address these challenges, we propose a novel framework, Task-Aware and Structure-Knowledge-guided Quantization (TASKQ). Our framework introduces three key components: a sparse quantization strategy to mitigate the impact of long-tailed activations, a Detection-aware Task Reg- ularization (DTR) mechanism that incorporates IoU-based loss to guide parameter fine-tuning, and a Scale-and-Task- Aware Head-wise Quantization (STAHQ) scheme that aligns quantization granularity with the head’s functional structure. Extensive experiments on various YOLO models demonstrate that TASKQ significantly outperforms existing PTQ methods, especially in low-bit scenarios, establishing a new state-of-the- art for end-to-end YOLO quantization.

Index terms

Deep Learning for Visual Perception Visual Learning Computer Vision for Transportation