Learn to Quantify Social Interaction with Constraints for Pedestrian Walking
Xiaodan Shi
AI summary
Problem
Current trajectory prediction models incorporate social interactions but treat them as a black box, failing to quantify or interpret the specific types of interactions and how they influence pedestrian decision-making, which limits robustness.
Approach
The authors propose "Learn to Cluster," a probabilistic latent variable method that automatically groups dynamic social interactions into discrete modes directly from trajectory data, regulated by a custom social loss function, and integrates these modes into a prediction network.
Key results
- Automatically clusters dynamic social interactions into interpretable modes without manual labels
- Integrates learned interaction patterns into a trajectory prediction model via pattern aggregation
- Achieves comparable or improved prediction accuracy on UCY and ETH benchmarks
- Provides interpretable interaction modes (e.g., aggressive, mild, no attention) validated through designed scenarios
Why it matters
Enhances the interpretability and robustness of pedestrian trajectory prediction, enabling safer and more reliable navigation for autonomous vehicles and social robots in crowded environments.
Abstract
Long-term human path forecasting in crowds is critical for autonomous moving platforms (like autonomous driving cars and social robots) to avoid collision and make high-quality planning. Although the current research take into account social interactions for prediction, they don’t reveal the exact kinds of social interactions happened among people and how the social interactions affect the decision-making process of pedestrians, which further limits its robustness. Social interactions in pedestrian walking are intuitively massive and hard to label and quantify. In this paper, we explore creatively to quantify and interpret how pedestrians interact with others by proposing Learn to Cluster. Our clustering social interactions is probabilistic latent variable generative, learning directly from sequential trajectory observations, scalable to arbitrary number of pedestrians. Learn to cluster is label-free and can be naturally integrated into the training process of the prediction model. The latent variables will then serve as ’labels’ to categorize social interactions. Extensive experiments over several trajectory prediction benchmarks demonstrate that our method is able to learn the patterns of social interactions and effectively integrate the patterns to pedestrian trajectory prediction.