← Back ICRA 2026

Multi-Horizon Lane Change Maneuver Prediction Using Multi-Modal Transformers

Petrit Rama, Praveen Kumar Gummadi and Naim Bajcinca

PDF

AI summary

Key figure (auto-extracted from paper)

A unified transformer model fuses multi-camera, semantic, graph, and sensor data to jointly predict lane change maneuvers and their temporal phases across multiple horizons with quantified uncertainty.

lane change prediction multi-modal transformers autonomous driving uncertainty estimation maneuver phase WylonSet++

Problem

Most autonomous driving research focuses on trajectory prediction rather than ego-centric maneuver intent, and existing datasets lack diverse, densely annotated lane change sequences with structured phase transitions.

Approach

The authors propose a transformer-based architecture that adaptively fuses visual, semantic, graph-based, and sensor modalities to jointly forecast lane change maneuvers and their progression phases over multiple future time steps, complemented by a multi-level uncertainty estimation branch.

Key results

Novel transformer-based multi-modal architecture for joint maneuver and phase prediction
Multi-level uncertainty estimation branch quantifying confidence across modalities and predictions
Strong multi-horizon prediction performance on diverse real-world traffic scenarios
Introduction of WylonSet++, a new dataset with dense lane change maneuver and phase annotations

Why it matters

Provides autonomous vehicles with interpretable, uncertainty-aware intent predictions to improve safety and decision-making in complex urban driving.

Abstract

Predicting lane change maneuvers is essential for ensuring safe autonomous driving, especially in complex urban environments. Building upon prior multi-modal and graph- based approaches, this work introduces a novel transformer- based architecture for multi-horizon lane change prediction that jointly estimates the lane change maneuver and the lane change phase. The proposed model integrates visual information from surround-view cameras, semantic masks for free space and lane markings, interaction-aware graph representations, and ego-vehicle state signals, within a unified transformer frame- work to capture spatial-temporal dependencies. In addition, a multi-level uncertainty estimation branch quantifies confidence at the level of modality, fusion, and prediction, to enhance interpretability and reliability. Experiments are conducted on WylonSet++, an extended in-house dataset collected using an instrumented test vehicle, annotated for lane change behavior analysis and maneuver phase transitions. The dataset com- prises synchronized front-facing camera images, left and right surround-view camera images, together with vehicle state data. The dataset contains approximately 600 lane change sequences, providing the foundation for this study. Extensive evaluations demonstrate strong performance in anticipating lane change maneuvers and phase progression across short- and long-term prediction horizons in diverse real-world traffic scenarios.

Index terms

Intelligent Transportation Systems Deep Learning Methods Constrained Motion Planning