← Back ICRA 2026

PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting

Yihong Xu, Yuan Yin, Eloi zablocki, Tuan-Hung Vu, Alexandre Boulch, Matthieu Cord

PDF

AI summary

Key figure (auto-extracted from paper)

Pretraining on noisy, diverse pseudo-labeled trajectories significantly boosts motion forecasting performance, especially when labeled data is scarce.

motion forecasting pretraining pseudo-labeling autonomous driving 3D tracking data efficiency

Problem

Current motion forecasting relies on costly, manually annotated datasets that are hard to scale and introduce domain gaps that limit generalization.

Approach

PPT automatically generates diverse pseudo-labeled trajectories using off-the-shelf 3D detectors and non-learning trackers, then pretrains forecasting models on this data before optional fine-tuning on a small labeled subset.

Key results

Reduces reliance on human annotations, excelling in low-data regimes (1-10% labeled data).
Demonstrates that inherent noise and diversity in pseudo-labels improve model robustness.
Shows post-processing is unnecessary and HD maps are optional for pretraining.
Achieves strong cross-domain, end-to-end, and multi-class generalization.

Why it matters

It provides a scalable, cost-effective pretraining strategy that enables robust motion forecasting across diverse real-world driving environments with minimal labeled data.

Abstract

Accurately predicting how agents move in dynamic scenes is essential for safe autonomous driving. State-of-the- art motion forecasting models rely on datasets with manually annotated or post-processed trajectories. However, building these datasets is costly, generally manual, hard to scale, and lacks reproducibility. They also introduce domain gaps that limit generalization across environments. We introduce PPT (Pretraining with Pseudo-labeled Trajectories), a simple and scalable pretraining framework that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking. Unlike data annotation pipelines aiming for clean, single-label annotations, PPT is a pretraining framework embracing off-the-shelf trajectories as useful signals for learning robust representations. With optional finetuning on a small amount of labeled data, models pretrained with PPT achieve strong performance across standard benchmarks, particularly in low-data regimes, and in cross-domain, end-to- end, and multi-class settings. PPT is easy to implement and improves generalization in motion forecasting.

Index terms

Autonomous Vehicle Navigation Computer Vision for Automation Vision-Based Navigation