← Back ICRA 2026

IMPACT: Behavioral Intention-Aware Multimodal Trajectory Prediction with Adaptive Context Trimming

Jiawei Sun, Xibin Yue, Jiahui Li, Tianle Shen, Chengran Yuan, Shuo Sun, Sheng Guo, Quanyun Zhou, Marcelo H Ang Jr

PDF

AI summary

Key figure (auto-extracted from paper)

Jointly predicting behavioral intentions and map occupancy enables dynamic context pruning, achieving state-of-the-art trajectory prediction accuracy with reduced computational overhead.

Trajectory Prediction Behavioral Intention Context Pruning Autonomous Driving Multimodal Prediction Vectorized Occupancy

Problem

Current trajectory prediction models overlook explicit behavioral intentions and indiscriminately attend to all scene elements, causing redundancy and high computational costs, while lacking ground-truth intention labels for training.

Approach

The IMPACT framework jointly predicts multimodal behavioral intentions and vectorized map occupancy using a shared encoder, then dynamically prunes irrelevant agents and map polylines before decoding trajectories.

Key results

Ranks first among LiDAR-free methods on the Waymo Motion Dataset
Achieves state-of-the-art performance on the Waymo Interactive Prediction Dataset
Improves softmAP by 10% over previous SOTA without model ensembling
Successfully deployed on real vehicles demonstrating practical effectiveness

Why it matters

Enhances the reliability, interpretability, and efficiency of autonomous driving systems by explicitly modeling agent intentions and optimizing computational resource allocation.

Abstract

While most prior research has focused on improving the precision of multimodal trajectory predictions, the explicit modeling of multimodal behavioral intentions (e.g., yielding, over- taking) remains relatively underexplored. This paper proposes a unified framework that jointly predicts both behavioral intentions and trajectories to enhance prediction accuracy, interpretability, and efficiency. Specifically, we employ a shared context encoder for both intention and trajectory predictions, thereby reduc- ing structural redundancy and information loss. Moreover, we address the lack of ground-truth behavioral intention labels in mainstream datasets (Waymo, Argoverse) by auto-labeling these datasets, thus advancing the community’s efforts in this direction. We further introduce a vectorized occupancy prediction module that infers the probability of each map polyline being occupied by the target vehicle’s future trajectory. By leveraging these intention and occupancy predictions priors, our method con- ducts dynamic, modality-dependent pruning of irrelevant agents and map polylines in the decoding stage, effectively reducing computational overhead and mitigating noise from non-critical elements. Our approach ranks first among LiDAR-free methods on the Waymo Motion Dataset and achieves SOTA performance on the Waymo Interactive Prediction Dataset. Remarkably, even without model ensembling, our single-model framework improves the softmAP by 10% compared to the previous SOTA method, BETOP, in Waymo Interactive Prediction Leaderboard. Further- more, the proposed framework has been successfully deployed on real vehicles, demonstrating its practical effectiveness in real- world applications.

Index terms

Motion and Path Planning Computer Vision for Transportation Task and Motion Planning