← Back ICRA 2024

Automatic Captioning Based on Visible and Infrared Images

Yan Wang, Shuli Lou, Kai Wang, Xiaohu Yuan, Huaping Liu

PDF

Abstract

In this paper, we tackle the task of image cap- tioning with the complementarity of visible light images and infrared images. To address this problem, we propose an RGB- IR image fusion captioning model, which can take full advan- tage of visible light images and infrared images under different conditions. Meanwhile, we develop a wearable environment- assisted system. In addition, we collect and annotate a new dataset containing 3510 pairs of RGB-IR images to support model training. Finally, we conduct extensive experiments to evaluate the model and system. Experimental results show that our new method and system significantly outperform baselines on multiple metrics and have potential practical value.

Index terms

Automation Technologies for Smart Cities Human-Centered Automation