Adaptive Splitting of Reusable Temporal Monitors for Rare Traffic Violations
Craig Innes, Subramanian Ramamoorthy
Abstract
Autonomous Vehicles (AVs) are often tested in simulation to estimate the probability they will violate safety specifications. Two common issues arise when using existing techniques to produce this estimation: If violations occur rarely, simple Monte-Carlo sampling techniques can fail to produce ef- ficient estimates; if simulation horizons are too long, importance sampling techniques (which learn proposal distributions from past simulations) can fail to converge. This paper addresses both issues by interleaving rare-event sampling techniques with on- line specification monitoring algorithms. We use adaptive multi- level splitting to decompose simulations into partial trajectories, then calculate the distance of those partial trajectories to failure by leveraging robustness metrics from Signal Temporal Logic (STL). By caching those partial robustness metric values, we can efficiently re-use computations across multiple sampling stages. Our experiments on an interstate lane-change scenario show our method is viable for testing simulated AV-pipelines, efficiently estimating failure probabilities for STL specifications based on real traffic rules. We produce better estimates than Monte-Carlo and importance sampling in fewer simulations. I. STATISTICAL SIMULATION FOR AV TESTING Autonomous Vehicles (AVs) typically undergo rigorous simulated testing before deployment [36]. A standard set of steps for testing is as follows: First we define a scenario (e.g., a highway lane-change expressed in OpenScenario [39]). Next, we define a safety specification (e.g., “avoid impeding traffic flow”) in a formal language like Signal Temporal Logic (STL). Then, we run stochastic simulations to estimate the probability our AV-system violates our specification [11]. This statistical simulation approach is used because modern AVs contain “black box” components like Neural-Network perception modules and non-linear solvers. Such components provide few analytical guarantees over their behaviour. A core problem plaguing statistical simulation is estimat- ing rare events. Consider a stochastic simulation scenario where there exists a 10−4 probability that random noise in the sensors will cause our AV to “fail” (i.e., to violate our safety specification). If we ran 100 simulations, it is likely none would produce a failure. Even if sampling did produce a failure, estimation variance would be unacceptable [23]. Many works address rare-event problems for AVs via Importance Sampling [6]: Importance samplers draw simula- tions from a proposal distribution where the factors leading to a failure occur more frequently. The final estimate of failure Authors from the University of Edinburgh, EH8 9AB, United King- dom. Corresponding author: craig.innes@ed.ac.uk. Work supported by a grant from the UKRI Strategic Priorities Fund to the UKRI Research Node on Trustworthy Autonomous Systems Governance and Regulation (EP/V026607/1, 2020-2024). For the purpose of open access, the author(s) has applied a Creative Commons Attribution (CC BY) license to any Accepted Manuscript version arising 20 40 60 80 100 −4 −2 0 2 4 6 13 15 100 “Preserve Traffic Flow” □[0,∞](¬slow_leading_vehicle(xego, xo1...o3) =⇒preserves_flow(xego)) Fig. 1: Lane-change. Moving vehicles (blue) shown with trajectory. ‘Ego’ vehicle must avoid static obstacles (red). We monitor the safety constraint shown in English and STL. probability is then re-weighted to reflect the original distri- bution. Since we do not know in advance all combinations of states which result in failure, such techniques must learn a good proposal. This learning step has no convergence guaran- tees, and probability estimates from such adaptive techniques can have unbounded error [3]. Importance sampling also tends to fare better when failures are caused by instantaneous single-state errors, but in the AV domain, failures often occur as a result of accumulated errors over dependent states [9]. This paper instead proposes an approach to AV rare-event simulation based on merging Adaptive Multi-level Splitting (AMS) [9] with STL monitoring. AMS relies on estimating probabilities for a sequence of decreasing failure thresholds γ1 > γ2 · · · > γM, where the final γM is equivalent to the rare failure event of interest. The key idea is that estimating any intermediate γi (given γi−1) is easier than estimating γM outright. To adapt AMS from estimating isolated phenomena (e.g., particle transport [27]) to estimating complex AV- system failures, we face two issues: The main issue is how to consistently produce simulations which fall below those intermediate failure thresholds γ0...M, and how to efficiently measure the distance to failure in the first place. Our approach measures failure using metrics for evaluating STL specification robustness. By leveraging online monitoring [13], we can cache the robustness values of partial trajectories, stop simulations at the point where they fall below the current threshold, and re-sample from this point onwards to produce trajectories which fall below subsequent failure thresholds. A secondary issue is generating stochastic AV perceptual errors. Approaches which assume noise follows a well- known (e.g., Gaussian) state-independent distribution [7] are insufficient to capture the perceptual variety of a typical AV- system—a LiDAR detector may be great for close range traffic, but terrible for long range or occluded traffic [33]. We therefore use a Perception Error Model (PEM) [34]—a 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) October 14-18, 2024. Abu Dhabi, UAE 979-8-3503-7769-9/24/$31.00 ©2024 IEEE 12385