IMPACT: Behavioral Intention-Aware Multimodal Trajectory Prediction with Adaptive Context Trimming
Jiawei Sun, Xibin Yue, Jiahui Li, Tianle Shen, Chengran Yuan, Shuo Sun, Sheng Guo, Quanyun Zhou, Marcelo H Ang Jr
AI summary
Problem
Current trajectory prediction models overlook explicit behavioral intentions and indiscriminately attend to all scene elements, causing redundancy and high computational costs, while lacking ground-truth intention labels for training.
Approach
The IMPACT framework jointly predicts multimodal behavioral intentions and vectorized map occupancy using a shared encoder, then dynamically prunes irrelevant agents and map polylines before decoding trajectories.
Key results
- Ranks first among LiDAR-free methods on the Waymo Motion Dataset
- Achieves state-of-the-art performance on the Waymo Interactive Prediction Dataset
- Improves softmAP by 10% over previous SOTA without model ensembling
- Successfully deployed on real vehicles demonstrating practical effectiveness
Why it matters
Enhances the reliability, interpretability, and efficiency of autonomous driving systems by explicitly modeling agent intentions and optimizing computational resource allocation.
Abstract
While most prior research has focused on improving the precision of multimodal trajectory predictions, the explicit modeling of multimodal behavioral intentions (e.g., yielding, over- taking) remains relatively underexplored. This paper proposes a unified framework that jointly predicts both behavioral intentions and trajectories to enhance prediction accuracy, interpretability, and efficiency. Specifically, we employ a shared context encoder for both intention and trajectory predictions, thereby reduc- ing structural redundancy and information loss. Moreover, we address the lack of ground-truth behavioral intention labels in mainstream datasets (Waymo, Argoverse) by auto-labeling these datasets, thus advancing the community’s efforts in this direction. We further introduce a vectorized occupancy prediction module that infers the probability of each map polyline being occupied by the target vehicle’s future trajectory. By leveraging these intention and occupancy predictions priors, our method con- ducts dynamic, modality-dependent pruning of irrelevant agents and map polylines in the decoding stage, effectively reducing computational overhead and mitigating noise from non-critical elements. Our approach ranks first among LiDAR-free methods on the Waymo Motion Dataset and achieves SOTA performance on the Waymo Interactive Prediction Dataset. Remarkably, even without model ensembling, our single-model framework improves the softmAP by 10% compared to the previous SOTA method, BETOP, in Waymo Interactive Prediction Leaderboard. Further- more, the proposed framework has been successfully deployed on real vehicles, demonstrating its practical effectiveness in real- world applications.