DCSANet: Dual Cross-Channel and Spatial Attention Make RGB-T Object Detection Better
Xiaoxiong Lan, Shenghao Liu, Zhiyong Zhang, Changzhen Qiu
Abstract
Multimodal image pairs can make object detection more reliable in challenging environments, so RGB-T object detection has gained extensive attention over the past decade. To alleviate the complementarity of the visible and thermal modality, we propose a novel lightweight Feature Enhancement- fusion Module (FEM), which is composed of the Channel Enhancement-fusion Unit (CEU) and Spatial Enhancement- fusion Unit (SEU) by extending the attention mechanism to operate on two modalities. CEU is used to exploit the com- plementarity and alleviate the data imbalance by combining internal and global channel attention. Additionally, SEU is utilized to guide the model to pay more attention to the regions of interest. By incorporating FEM, enhanced and fused features are obtained, leading to improved performance. The effectiveness and generalizability of FEM are validated by two public datasets and our proposed DCSANet achieves compet- itive performance while maintaining high speed (+%7.0 on LLVIP and +1.2% on FLIR in mAP). Moreover, we conducted ablation experiments to verify the effectiveness of the proposed operators.