NSF-HRPT: Neural Semantic Field Meets Hierarchical Risk Perception Tree for Safety-Critical Scenario Assessment
Yu Zhao, Jiangyu Pan, Tao Hu, Ming Yin, Fan Yang, Jiangfan Liu, Xiubo Liang
AI summary
Problem
Quantifying precise, agent-specific collision risk from monocular vision is hindered by sparse real-world critical incident data, complex multi-agent dynamics, and the sim-to-real domain gap.
Approach
The method trains a Neural Semantic Field on simulation data to predict scene semantics and probabilistic Time-to-Collision, then uses a Hierarchical Risk Perception Tree for efficient parallel risk reasoning at inference, augmented by foundation model priors for real-world adaptation.
Key results
- State-of-the-art performance on synthetic safety-critical benchmarks
- Competitive TTC estimation and risk localization on real-world datasets
- Real-time, parallel multi-agent risk assessment from monocular inputs
- Effective sim-to-real domain bridging without model retraining
Why it matters
Enables autonomous vehicles to make reliable, safety-critical decisions by providing accurate, real-time collision risk awareness from standard camera feeds.
Abstract
The ability to accurately assess and anticipate risks in safety-critical scenarios is crucial for autonomous driving systems. While existing research has made progress in collision prediction, accurately quantifying risk levels from monocular vision inputs remains challenging due to the complex dynamics of multi-agent interactions and the inherent uncer- tainty in real-world environments. To address these challenges, we present NSF-HRPT, a novel framework that combines learning-based perception with structured reasoning for quanti- tative risk assessment. Our approach features a Neural Seman- tic Field (NSF) that learns to model scene semantics, trajectory predictions, and probabilistic Time-to-Collision (TTC) distribu- tions from simulation data. During inference, the pre-trained NSF serves as a prior for our Hierarchical Risk Perception Tree (HRPT), which enables efficient parallel computation and spatial reasoning about multi-agent risks. Additionally, we introduce a Sim2Real enhancement strategy that improves real- world applicability without retraining by incorporating priors from foundation models. Extensive evaluations demonstrate that our framework achieves state-of-the-art performance on synthetic benchmarks and delivers competitive, near-state-of- the-art results on real-world datasets for both TTC estimation accuracy and risk localization precision. The proposed method provides an effective solution for real-time risk awareness from monocular camera inputs.