ParkDiffusion++: Ego Intention Conditioned Joint Trajectory Prediction for Automated Parking Using Diffusion Models
Jiarong WEI, Anna Rehr, Christian Feist, Abhinav Valada
AI summary
Problem
Automated parking requires predicting multiple plausible ego intentions and the corresponding joint responses of surrounding agents, but existing methods treat these interdependent problems in isolation and lack supervision for counterfactual scenarios.
Approach
The method uses a two-stage framework: first, an ego intention tokenizer predicts discrete endpoint intentions from scene context; second, an ego-conditioned joint predictor generates socially consistent multi-agent trajectories, refined by a safety-guided denoiser and trained with counterfactual knowledge distillation to handle unobserved what-if scenarios.
Key results
- State-of-the-art performance on Dragon Lake Parking and inD datasets
- Socially consistent joint trajectories conditioned on alternative ego intentions
- Novel counterfactual knowledge distillation module for unobserved scenarios
- Accurate what-if predictions showing appropriate reactive behaviors from surrounding agents
Why it matters
Enables safer and more robust decision-making for automated parking systems by modeling complex multi-agent interactions and counterfactual scenarios.
Abstract
Automated parking is a challenging operational domain for advanced driver assistance systems, requiring robust scene understanding and interaction reasoning. The key challenge is twofold: (i) predict multiple plausible ego intentions according to context and (ii) for each intention, predict the joint responses of surrounding agents, enabling effective what-if decision-making. However, existing methods often fall short, typically treating these interdependent problems in isolation. We propose ParkDiffusion++, which jointly learns a multi-modal ego intention predictor and an ego conditioned multi-agent joint trajectory predictor for automated parking. Our approach makes several key contributions. First, we introduce an ego intention tokenizer that predicts a small set of discrete endpoint intentions from agent histories and vectorized map polylines. Second, we perform ego intention conditioned joint prediction, yielding socially consistent predictions of the surrounding agents for each possible ego intention. Third, we employ a lightweight safety-guided denoiser with different constraints to refine joint scenes during training, thus improving accuracy and safety. Fourth, we propose counterfactual knowledge distillation, where an EMA teacher refined by a frozen safety-guided denoiser provides pseudo-targets that capture how agents react to alternative ego intentions. Extensive evaluations demonstrate that ParkDiffusion++ achieves state-of-the-art performance on the Dragon Lake Parking (DLP) dataset and the Intersections Drone (inD) dataset. Importantly, qualitative what-if visualizations show that other agents react appropriately to different ego intentions.