TrajMamba: An Ego-Motion-Guided Mamba Model for Pedestrian Trajectory Prediction from an Egocentric Perspective
Yusheng Peng, Gaofeng Zhang, Liping Zheng
AI summary
Problem
Predicting future pedestrian trajectories from an egocentric perspective is hindered by the complex superposition of pedestrian and ego-camera motion, which existing feature-fusion methods model ambiguously.
Approach
The method uses two Mamba encoders to separately extract pedestrian and ego-motion features, then employs an ego-motion-guided Mamba decoder that explicitly conditions future predictions on vehicle motion while predicting residual offsets from a constant-velocity reference.
Key results
- Proposes TrajMamba, an ego-motion-guided Mamba framework for egocentric trajectory prediction
- Introduces an ego-motion-guided Mamba decoder that explicitly models pedestrian-camera relative motion
- Achieves state-of-the-art performance on PIE and JAAD datasets, surpassing prior best models by up to 44% on displacement error metrics
- Introduces a differentiated input-output representation using a constant-velocity and constant-scaling assumption to predict residual bounding box offsets
Why it matters
Enhances the safety and navigation accuracy of autonomous vehicles and mobile robots by enabling reliable first-person perspective pedestrian tracking in dynamic environments.
Abstract
Future trajectory prediction of a tracked pedes- trian from an egocentric perspective is a key task in areas such as autonomous driving and robot navigation. The challenge of this task lies in the complex dynamic relative motion between the ego-camera and the tracked pedestrian. To address this challenge, we propose an ego-motion-guided trajectory prediction network based on the Mamba model. Firstly, two Mamba models are used as encoders to extract pedestrian motion and ego-motion features from pedestrian movement and ego-vehicle movement, respectively. Then, an ego-motion- guided Mamba decoder that explicitly models the relative motion between the pedestrian and the vehicle by integrating pedestrian motion features as historical context with ego-motion features as guiding cues to capture decoded features. Finally, the future trajectory is generated from the decoded features corresponding to the future timestamps. Extensive experiments demonstrate the effectiveness of the proposed model, which achieves state-of-the-art performance on the PIE and JAAD datasets.