Multi-Horizon Lane Change Maneuver Prediction Using Multi-Modal Transformers
Petrit Rama, Praveen Kumar Gummadi and Naim Bajcinca
AI summary
Problem
Most autonomous driving research focuses on trajectory prediction rather than ego-centric maneuver intent, and existing datasets lack diverse, densely annotated lane change sequences with structured phase transitions.
Approach
The authors propose a transformer-based architecture that adaptively fuses visual, semantic, graph-based, and sensor modalities to jointly forecast lane change maneuvers and their progression phases over multiple future time steps, complemented by a multi-level uncertainty estimation branch.
Key results
- Novel transformer-based multi-modal architecture for joint maneuver and phase prediction
- Multi-level uncertainty estimation branch quantifying confidence across modalities and predictions
- Strong multi-horizon prediction performance on diverse real-world traffic scenarios
- Introduction of WylonSet++, a new dataset with dense lane change maneuver and phase annotations
Why it matters
Provides autonomous vehicles with interpretable, uncertainty-aware intent predictions to improve safety and decision-making in complex urban driving.
Abstract
Predicting lane change maneuvers is essential for ensuring safe autonomous driving, especially in complex urban environments. Building upon prior multi-modal and graph- based approaches, this work introduces a novel transformer- based architecture for multi-horizon lane change prediction that jointly estimates the lane change maneuver and the lane change phase. The proposed model integrates visual information from surround-view cameras, semantic masks for free space and lane markings, interaction-aware graph representations, and ego-vehicle state signals, within a unified transformer frame- work to capture spatial-temporal dependencies. In addition, a multi-level uncertainty estimation branch quantifies confidence at the level of modality, fusion, and prediction, to enhance interpretability and reliability. Experiments are conducted on WylonSet++, an extended in-house dataset collected using an instrumented test vehicle, annotated for lane change behavior analysis and maneuver phase transitions. The dataset com- prises synchronized front-facing camera images, left and right surround-view camera images, together with vehicle state data. The dataset contains approximately 600 lane change sequences, providing the foundation for this study. Extensive evaluations demonstrate strong performance in anticipating lane change maneuvers and phase progression across short- and long-term prediction horizons in diverse real-world traffic scenarios.