Explaining Failures of Cyber-Physical Systems with Actual Causality
Khen Elimelech, Tom Yaacov, David A. Kelly, Hana Chockler, Moshe Y. Vardi
AI summary
Problem
Unexpected failures in black-box cyber-physical systems are difficult to diagnose, and existing actual causality frameworks are limited to simple classifiers, leaving a gap for accurate failure analysis in temporally-extended environments.
Approach
The authors adapt the actual causality framework to handle CPS-specific environmental variables and propose two system-agnostic algorithms: one that exhaustively searches for optimal explanations and another that uses causal responsibility heuristics for scalable efficiency.
Key results
- Theoretical extension of actual causality to temporally-extended CPS models
- Identification of critical modeling differences between classifiers and CPS environments
- Development of an exhaustive algorithm guaranteeing optimal failure explanations
- Development of a computationally efficient heuristic algorithm for scalable explanations
Why it matters
It provides a rigorous, model-agnostic method to diagnose black-box CPS failures, directly supporting safety verification, trust-building, and iterative improvement for autonomous system developers and regulators.
Abstract
Modern autonomous Cyber-Physical Systems (CPSs), such as self-driving cars, face increasingly complex demands, and yet are expected to act reliably. The black- box nature often characterizing such systems, especially those relying on neural components, makes it impossible to fully verify the system behavior prior to deployment. Unfortunately, unexpected failures—cases when the system does not comply with its specification—are inevitable and may have catastrophic implications. To improve trust in the system and facilitate future mitigation after a failure occurs, it is important to try to derive an explanation for the unexpected system behavior. This paper introduces the novel concept of leveraging the framework of actual causality for CPS failure explanation. Up until now, this framework was only used to derive explanations in the context of simple systems, such as image classifiers. This paper addresses the theoretical gaps and provides the guidance needed to allow for correct explanation derivation in the CPS domain. Beyond the theoretical contribution, the paper presents two novel, practical, system-agnostic explanation derivation algorithms, allowing to prioritize either explanation optimality or derivation efficiency. The approach is demonstrated and eval- uated in the context of a neural-network-controlled autonomous car, designed to avoid collisions.