Knowledge-Guided Graph Convolutional Network for Multi-Label Image Classification
Christine Dewi, Dhananjay Thiruvady, Stephen Abednego Philemon, Nayyar Zaidi
AI summary
Problem
Traditional CNNs for multi-label image classification struggle to capture semantic dependencies and contextual relationships between labels, leading to suboptimal performance in complex scenes.
Approach
The authors propose KGGCN, a hybrid framework that combines a Darknet53 visual backbone with a knowledge-guided graph convolutional network that uses ConceptNet5 embeddings to model and propagate semantic label relationships.
Key results
- State-of-the-art mAP of 96.24% on VOC 2007 and 85.25% on COCO
- Superior performance in high-variability classes like bus, car, and person
- Consistent accuracy gains from semantic knowledge integration without added complexity
- Improved prediction completeness by correctly identifying related objects missed by baselines
Why it matters
Provides a robust, knowledge-enhanced approach for multi-label recognition that benefits computer vision applications requiring contextual understanding, such as medical diagnosis, scene recognition, and human attribute detection.
Abstract
Multi-label image classification is a significant challenge in computer vision due to the presence of multiple interconnected objects in a single image. Traditional convolu- tional neural networks (CNN) often fail to capture semantic dependencies between labels, limiting performance in complex scenes. To address this issue, we propose a novel framework that combines Knowledge-Guided Graph Convolutional Net- work (KGGCN) with Darknet53 backbone to improve label dependency modelling. Our method fuses external semantic information from ConceptNet5, which allows the model to learn contextual relationships between labels. Our work evaluate this approach on two benchmark datasets, VOC 2007 and COCO, and obtain state-of-the-art results. KGGCN achieves an Average Precision (mAP) of 96.24% on VOC 2007 and 85.25% on COCO, outperforming existing methods in most categories. Moreover, ablation studies further highlight the benefits of external knowledge integration contributing to higher mAP scores. Finally, our proposed method KGGCN demonstrates the effectiveness of combining deep visual features with structured semantic knowledge for multi-label image classification.