RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
Ryo Fujii, Hideo Saito, Ryo Hachiuma
AI summary
Problem
Conventional pedestrian trajectory forecasting models rely on perfect tracking data, expensive real-world data collection, and costly person ID annotations, severely limiting their practical deployment.
Approach
The authors introduce RealTraj, which pretrains a detection-based Transformer on synthetic data using self-supervised pretext tasks, then fine-tunes it on limited real-world data using only ground-truth detections without identity labels.
Key results
- Det2TrajFormer model remains robust to tracking noise by using raw detections
- Self-supervised pretraining on synthetic data significantly reduces real-world data requirements
- Weakly-supervised fine-tuning eliminates the need for costly person ID annotations
- Outperforms state-of-the-art methods across multiple real-world trajectory datasets
Why it matters
Enables robust, cost-effective pedestrian motion prediction for autonomous driving and robotics by removing reliance on perfect tracking and expensive annotations.
Abstract
This paper jointly addresses three key limitations in conventional pedestrian trajectory forecasting: pedestrian perception errors, real-world data collection costs, and person ID annotation costs. We propose a novel framework, Real- Traj, that enhances the real-world applicability of trajectory forecasting. Our approach includes two training phases—self- supervised pretraining on synthetic data and weakly-supervised fine-tuning with limited real-world data—to minimize data col- lection efforts. To improve robustness to real-world errors, we focus on both model design and training objectives. Specifically, we present Det2TrajFormer, a trajectory forecasting model that remains invariant to tracking noise by using past detections as inputs. Additionally, we pretrain the model using multiple pre- text tasks, which enhance robustness and improve forecasting performance based solely on detection data. Unlike previous trajectory forecasting methods, our approach fine-tunes the model using only ground-truth detections, reducing the need for costly person ID annotations. In the experiments, we comprehensively verify the effectiveness of the proposed method against the limitations, and the method outperforms state-of- the-art trajectory forecasting methods on multiple datasets.