← Back ICRA 2026

Learning Location-Specific Latent Behavior Priors for Occupancy Prediction in Automated Driving

Julian Schmidt, Mario Ruoff, Christoph Rist, Julian Jordan

PDF

AI summary

Key figure (auto-extracted from paper)

A learned latent grid can replace costly HD maps while matching or exceeding their performance in occupancy prediction.

Occupancy prediction Latent behavior priors Automated driving HD map-free learning Motion forecasting Data-driven priors

Problem

Automated driving systems heavily rely on high-definition (HD) maps for location-specific behavioral priors, but creating and maintaining these maps is costly, complex, and prone to errors. This paper asks how to learn location-specific priors directly from data without relying on HD maps or additional labels.

Approach

The method aggregates egocentric trajectory data into a global latent grid that acts as a learnable behavior prior, updating it directly via backpropagation of the occupancy prediction loss. A masking strategy sparsifies these updates to reduce noise and improve training stability.

Key results

Matches or exceeds HD map-based occupancy prediction accuracy on the Lyft dataset
Distills geometric and semantic location patterns purely from agent behavior
Scales to large-scale real-world datasets without auxiliary labels
Masking strategy significantly improves performance for Transformer-based architectures

Why it matters

Enables scalable, cost-effective automated driving systems by eliminating the need for expensive HD map infrastructure.

Abstract

Performance in automated driving tasks improves significantly with the incorporation of location-specific prior knowledge. This is because agent behavior usually strongly correlates with location features. A common example is the strong tendency of vehicles to follow their lane, but less obvious interactions exist as well. To this end, high definition (HD) map information is typically collected and made available during both training and inference to act as a location prior. In this paper, we propose to aggregate location-specific information in a data-driven way. Specifically, we learn a global latent grid that acts as a behavior prior to a learned occupancy prediction model. Since the prediction loss function is directly backpropagated into the latent grid, no additional labels are required beyond the already available future agent locations. We use the large real-world Lyft Level 5 motion prediction dataset to empirically demonstrate the merit of our learned location-specific latent behavior prior. Applied to two different prediction models, our approach achieves performance comparable to or exceeding baseline models that rely on HD maps, without requiring an HD map. Additional experiments reveal that the latent behavior prior is able to distill geometric and semantic information purely from agent behavior. These results indicate that directly learning location-specific priors is a promising direction towards automated driving without costly HD maps.

Index terms

AI-Based Methods Imitation Learning Semantic Scene Understanding