← Back ICRA 2026

Knowledge-Guided Graph Convolutional Network for Multi-Label Image Classification

Christine Dewi, Dhananjay Thiruvady, Stephen Abednego Philemon, Nayyar Zaidi

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating external semantic knowledge from ConceptNet5 into a graph convolutional network significantly improves multi-label image classification accuracy by capturing label dependencies that traditional CNNs miss.

Multi-label classification Graph Convolutional Networks Knowledge-guided learning ConceptNet5 Darknet53 Semantic dependencies

Problem

Traditional CNNs for multi-label image classification struggle to capture semantic dependencies and contextual relationships between labels, leading to suboptimal performance in complex scenes.

Approach

The authors propose KGGCN, a hybrid framework that combines a Darknet53 visual backbone with a knowledge-guided graph convolutional network that uses ConceptNet5 embeddings to model and propagate semantic label relationships.

Key results

State-of-the-art mAP of 96.24% on VOC 2007 and 85.25% on COCO
Superior performance in high-variability classes like bus, car, and person
Consistent accuracy gains from semantic knowledge integration without added complexity
Improved prediction completeness by correctly identifying related objects missed by baselines

Why it matters

Provides a robust, knowledge-enhanced approach for multi-label recognition that benefits computer vision applications requiring contextual understanding, such as medical diagnosis, scene recognition, and human attribute detection.

Abstract

Multi-label image classification is a significant challenge in computer vision due to the presence of multiple interconnected objects in a single image. Traditional convolu- tional neural networks (CNN) often fail to capture semantic dependencies between labels, limiting performance in complex scenes. To address this issue, we propose a novel framework that combines Knowledge-Guided Graph Convolutional Net- work (KGGCN) with Darknet53 backbone to improve label dependency modelling. Our method fuses external semantic information from ConceptNet5, which allows the model to learn contextual relationships between labels. Our work evaluate this approach on two benchmark datasets, VOC 2007 and COCO, and obtain state-of-the-art results. KGGCN achieves an Average Precision (mAP) of 96.24% on VOC 2007 and 85.25% on COCO, outperforming existing methods in most categories. Moreover, ablation studies further highlight the benefits of external knowledge integration contributing to higher mAP scores. Finally, our proposed method KGGCN demonstrates the effectiveness of combining deep visual features with structured semantic knowledge for multi-label image classification.

Index terms

Deep Learning for Visual Perception Computer Vision for Automation Visual Learning