Learning Location-Specific Latent Behavior Priors for Occupancy Prediction in Automated Driving
Julian Schmidt, Mario Ruoff, Christoph Rist, Julian Jordan
AI summary
Problem
Automated driving systems heavily rely on high-definition (HD) maps for location-specific behavioral priors, but creating and maintaining these maps is costly, complex, and prone to errors. This paper asks how to learn location-specific priors directly from data without relying on HD maps or additional labels.
Approach
The method aggregates egocentric trajectory data into a global latent grid that acts as a learnable behavior prior, updating it directly via backpropagation of the occupancy prediction loss. A masking strategy sparsifies these updates to reduce noise and improve training stability.
Key results
- Matches or exceeds HD map-based occupancy prediction accuracy on the Lyft dataset
- Distills geometric and semantic location patterns purely from agent behavior
- Scales to large-scale real-world datasets without auxiliary labels
- Masking strategy significantly improves performance for Transformer-based architectures
Why it matters
Enables scalable, cost-effective automated driving systems by eliminating the need for expensive HD map infrastructure.
Abstract
Performance in automated driving tasks improves significantly with the incorporation of location-specific prior knowledge. This is because agent behavior usually strongly correlates with location features. A common example is the strong tendency of vehicles to follow their lane, but less obvious interactions exist as well. To this end, high definition (HD) map information is typically collected and made available during both training and inference to act as a location prior. In this paper, we propose to aggregate location-specific information in a data-driven way. Specifically, we learn a global latent grid that acts as a behavior prior to a learned occupancy prediction model. Since the prediction loss function is directly backpropagated into the latent grid, no additional labels are required beyond the already available future agent locations. We use the large real-world Lyft Level 5 motion prediction dataset to empirically demonstrate the merit of our learned location-specific latent behavior prior. Applied to two different prediction models, our approach achieves performance comparable to or exceeding baseline models that rely on HD maps, without requiring an HD map. Additional experiments reveal that the latent behavior prior is able to distill geometric and semantic information purely from agent behavior. These results indicate that directly learning location-specific priors is a promising direction towards automated driving without costly HD maps.