← Back ICRA 2026

Robustness of Panoptic Segmentation for Degraded Automotive Cameras Data

Yiting Wang, Haonan Zhao, Mehrdad Dianati, Kurt Debattista, Valentina Donzella

PDF

AI summary

Key figure (auto-extracted from paper)

Transformer-based models outperform CNNs in robustness to camera degradation but face real-time deployment limits, while frequency-based image quality metrics reliably predict segmentation performance.

panoptic segmentation automotive camera robustness degraded datasets image quality metrics transformer vs CNN autonomous driving

Problem

Panoptic segmentation performance in automated driving degrades under real-world camera conditions, yet lacks comprehensive evaluation frameworks and realistic degraded datasets to systematically model this relationship.

Approach

The authors introduce a unified evaluation pipeline featuring a novel dataset of 19 realistic automotive degradations and systematically benchmark 14 state-of-the-art segmentation models against traditional and frequency-based image quality metrics.

Key results

Large-particle degradations like lens droplets and heavy snow cause severe performance drops and edge-concentrated errors.
Transformer architectures outperform CNNs under adverse conditions but incur high computational costs and low frame rates.
Frequency-based image quality metrics, particularly CW-SSIM, strongly correlate with and predict segmentation robustness.
Standard visual restoration techniques fail to consistently improve downstream segmentation, underscoring the need for perception-specific enhancement.

Why it matters

Provides autonomous driving engineers with a critical benchmark, predictive diagnostic tools, and clear architectural trade-offs for building reliable perception systems in adverse weather and lighting.

Abstract

Precise situational awareness is essential for the safe deployment of artificial intelligence in real-world applications, particularly in assisted and automated driving (AAD) systems. Among perception techniques, panoptic segmentation is a promis- ing technique to identify and categorise objects, impending hazards, and drivable space at a pixel level. While panoptic quality might be affected by automotive camera data quality, a comprehensive understanding and modelling of their relationship remains underexplored. Motivated by such a need, this work pro- poses a unifying pipeline to evaluate the robustness of panoptic segmentation models for automotive cameras, correlating it with 8 traditional image quality metrics (IQA). The proposed pipeline begins by generating a novel degraded dataset, D-Cityscapes+, featuring 19 realistic automotive degradation types at varying severity levels, including novel models for darkness and snowfall conditions with veiling effect. Evaluations on 14 state-of-the-art segmentation model backbones yielded key insights: 1) large- particle degradations (e.g., lens droplets, heavy snow) severely degrade segmentation performance, increasing uncertainty and edge-concentrated segmentation errors; 2) Transformer-based models outperform CNN models under adverse conditions; how- ever, longer processing time, a higher number of parameters, and computational cost are limiting their real-world deployment; 3) Frequency-based IQA metrics, such as CW-SSIM, strongly corre- late with segmentation performance, serving as reliable predictive tools. 4) visual enhancements via restoration do not coherently benefit downstream segmentation tasks, underscoring the need for perception-specific restoration techniques. The benchmark and code:https://github.com/Warwick-Jocelyn/BRPS. Note to Practitioners — Reliable situational awareness is vital for autonomous systems operating under adverse conditions and potentially degraded sensor data. This study shows that while Transformer-based architectures outperform traditional CNNs in robustness, their long processing time (often <2 FPS) and high computational costs (often >500 GFLOps) limit real-world deployment. We also find that simply increasing the quantity of training data without considering ‘noise coverage’ offers very limited improvement in model robustness under extreme weather conditions, due to the long-tail effect. These findings underscore the practical value of degraded datasets such as D-Cityscapes+, which simulate 19 realistic conditions at multiple severity levels. Moreover, frequency- based image quality metrics such as CW-SSIM are shown to correlate strongly with perception performance, offering a practical diagnostic tool for benchmarking segmentation robustness. The presented insights provide practitioners 1WMG, University of Warwick.2School of Engineering and Materials Science, Queen Mary University of London. 3 Queen’s University of Belfast The work was partially supported by the Centre for Doctoral Training to Advance Deployment of Future Mobility Technologies (CDT FMT) High-Value Manufacturing CATAPULT. Corresponding au- thor:Yiting.Wang.1@warwick.ac.uk. Fig. 1. Visual examples of the newly proposed degraded dataset (D- Cityscapes+) with 19 types of degradation, from top to bottom, are categorised as unfavourable light, adverse weather, internal sensor noises, motion blur, and distortion artefacts. with a valuable benchmark, predictive metrics, and clear guidelines for developing robust, real-time-capable seg- mentation models suitable for challenging automotive envi- ronments. Future research could also explore multimodal sensing integration to enhance overall system robustness.

Index terms

Computer Vision for Automation Data Sets for Robotic Vision Deep Learning for Visual Perception