← Back ICRA 2024

Efficient Semantic Segmentation for Compressed Video

Jiaxin Cai, Qi Li, Yulin Shen, Jia Pan, Wenxi Liu

PDF

Abstract

Robots, constrained by limited onboard com- puting resources, often encounter situations wherein high- resolution and high-bit-rate videos captured by their cameras necessitate compression before further analysis. In this paper, we propose a novel video semantic segmentation paradigm for compressed video. Specifically, our framework draws the inspiration from the principle of Wavelet Transform, and thus we design the network structure, WTDecomNet, approx- imating the decomposition of high-resolution image into its low-resolution counterpart and axial details. The aim is to well preserve the image content through decomposition and maintain model efficiency by obtaining semantics from low- resolution image. To facilitate this purpose, we propose an efficient axial subband approximation module for extracting axial details and a lightweight temporal alignment module for associating keyframes and non-keyframes of compressed video. Through comprehensive experiments, we show that our model can achieve the state-of-the-art performance on public benchmarks. Especially on CamVid, comparing to baseline, our proposed model reduces the computational overhead by ∼70% while improving mIoU by ∼4%.

Index terms

Object Detection Segmentation and Categorization Deep Learning for Visual Perception