← Back ICRA 2023

FDLNet: Boosting Real-Time Semantic Segmentation by Image-Size Convolution Via Frequency Domain Learning

Qingqing Yan, Shu Li, Chengju Liu, Ming Liu, Qijun Chen

PDF

Abstract

This paper proposes a novel real-time semantic segmentation network via frequency domain learning, called FDLNet, which revisits the segmentation task from two critical perspectives: spatial structure description and multilevel feature fusion. We first devise an image-size convolution (IS-Conv) as a global frequency-domain learning operator to capture long- range dependency in a single shot. To model spatial structure in- formation, we construct the global structure representation path (GSRP) based on IS-Conv, which learns a unified edge-region representation with affordable complexity. For efficient and lightweight multi-level feature fusion, we propose the factorized stereoscopic attention (FSA) module, which alleviates semantic confusion and reduces feature redundancy by introducing level- wise attention before channel and spatial attention. Combining the above modules, we propose a concise semantic segmentation framework named FDLNet. We experimentally demonstrate the effectiveness and superiority of the proposed method. FDLNet achieves state-of-the-art performance on the Cityscapes, which reports 76.32% mIoU at 150+ FPS and 79.0% mIoU at 41+ FPS. The code is available at https://github.com/qyan0131/FDLNet.

Index terms

Semantic Scene Understanding Computer Vision for Transportation Deep Learning for Visual Perception