Hyper-STTN: Hypergraph Augmented Spatial-Temporal Transformer for Trajectory Prediction
Weizheng Wang, Baijian Yang, Sungeun Hong, Wenhai Sun, Byung-Cheol Min
AI summary
Problem
Accurately forecasting human trajectories in crowds remains difficult due to complex pairwise spatial-temporal interactions and heterogeneous groupwise dynamics that existing models fail to jointly capture and align.
Approach
The method builds multiscale hypergraphs using Mahalanobis distance to model groupwise correlations, while a spatial-temporal transformer captures pairwise interactions, with a multimodal transformer aligning these heterogeneous features before trajectory decoding.
Key results
- Jointly models groupwise and pairwise social interactions across spatial-temporal domains
- Introduces a multimodal transformer to align heterogeneous interaction features
- Constructs context-aware multiscale hypergraphs via Mahalanobis distance-based KNN
- Consistently outperforms state-of-the-art baselines on public pedestrian trajectory datasets
Why it matters
Enables more reliable crowd behavior forecasting for safety-critical applications like autonomous driving and social robotics.
Abstract
Predicting crowd intentions and trajectories is critical for a range of real-world applications, involving so- cial robotics and autonomous driving. Accurately modeling such behavior remains challenging due to the complexity of pairwise spatial-temporal interactions and the heterogeneous influence of groupwise dynamics. To address these challenges, we propose Hyper-STTN, a Hypergraph-augmented Spatial- Temporal Transformer Network for crowd trajectory predic- tion. Hyper-STTN constructs crowd hypergraphs with multi- scale group sizes to model groupwise correlations, captured through spectral hypergraph convolution based on hypergraph random walk. In parallel, a spatial-temporal transformer is employed to learn pedestrians’ pairwise latent interactions across multimodal dimensions. Eventually, above heterogeneous groupwise and pairwise features are subsequently incorporated and aligned via a multimodal transformer. Extensive experi- ments on public pedestrian motion datasets demonstrate that Hyper-STTN consistently outperforms state-of-the-art baselines and ablation models. The project website is available at https: //sites.google.com/view/hypersttn.