PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting
Yihong Xu, Yuan Yin, Eloi zablocki, Tuan-Hung Vu, Alexandre Boulch, Matthieu Cord
AI summary
Problem
Current motion forecasting relies on costly, manually annotated datasets that are hard to scale and introduce domain gaps that limit generalization.
Approach
PPT automatically generates diverse pseudo-labeled trajectories using off-the-shelf 3D detectors and non-learning trackers, then pretrains forecasting models on this data before optional fine-tuning on a small labeled subset.
Key results
- Reduces reliance on human annotations, excelling in low-data regimes (1-10% labeled data).
- Demonstrates that inherent noise and diversity in pseudo-labels improve model robustness.
- Shows post-processing is unnecessary and HD maps are optional for pretraining.
- Achieves strong cross-domain, end-to-end, and multi-class generalization.
Why it matters
It provides a scalable, cost-effective pretraining strategy that enables robust motion forecasting across diverse real-world driving environments with minimal labeled data.
Abstract
Accurately predicting how agents move in dynamic scenes is essential for safe autonomous driving. State-of-the- art motion forecasting models rely on datasets with manually annotated or post-processed trajectories. However, building these datasets is costly, generally manual, hard to scale, and lacks reproducibility. They also introduce domain gaps that limit generalization across environments. We introduce PPT (Pretraining with Pseudo-labeled Trajectories), a simple and scalable pretraining framework that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking. Unlike data annotation pipelines aiming for clean, single-label annotations, PPT is a pretraining framework embracing off-the-shelf trajectories as useful signals for learning robust representations. With optional finetuning on a small amount of labeled data, models pretrained with PPT achieve strong performance across standard benchmarks, particularly in low-data regimes, and in cross-domain, end-to- end, and multi-class settings. PPT is easy to implement and improves generalization in motion forecasting.