← Back ICRA 2026

Learn to Quantify Social Interaction with Constraints for Pedestrian Walking

Xiaodan Shi

PDF

AI summary

Key figure (auto-extracted from paper)

A label-free clustering framework automatically quantifies and interprets hidden social interaction patterns in pedestrian crowds, directly enhancing trajectory prediction accuracy and model transparency.

social interactions trajectory prediction learn to cluster interpretability autonomous driving pedestrian modeling

Problem

Current trajectory prediction models incorporate social interactions but treat them as a black box, failing to quantify or interpret the specific types of interactions and how they influence pedestrian decision-making, which limits robustness.

Approach

The authors propose "Learn to Cluster," a probabilistic latent variable method that automatically groups dynamic social interactions into discrete modes directly from trajectory data, regulated by a custom social loss function, and integrates these modes into a prediction network.

Key results

Automatically clusters dynamic social interactions into interpretable modes without manual labels
Integrates learned interaction patterns into a trajectory prediction model via pattern aggregation
Achieves comparable or improved prediction accuracy on UCY and ETH benchmarks
Provides interpretable interaction modes (e.g., aggressive, mild, no attention) validated through designed scenarios

Why it matters

Enhances the interpretability and robustness of pedestrian trajectory prediction, enabling safer and more reliable navigation for autonomous vehicles and social robots in crowded environments.

Abstract

Long-term human path forecasting in crowds is critical for autonomous moving platforms (like autonomous driving cars and social robots) to avoid collision and make high-quality planning. Although the current research take into account social interactions for prediction, they don’t reveal the exact kinds of social interactions happened among people and how the social interactions affect the decision-making process of pedestrians, which further limits its robustness. Social interactions in pedestrian walking are intuitively massive and hard to label and quantify. In this paper, we explore creatively to quantify and interpret how pedestrians interact with others by proposing Learn to Cluster. Our clustering social interactions is probabilistic latent variable generative, learning directly from sequential trajectory observations, scalable to arbitrary number of pedestrians. Learn to cluster is label-free and can be naturally integrated into the training process of the prediction model. The latent variables will then serve as ’labels’ to categorize social interactions. Extensive experiments over several trajectory prediction benchmarks demonstrate that our method is able to learn the patterns of social interactions and effectively integrate the patterns to pedestrian trajectory prediction.

Index terms

Motion and Path Planning Deep Learning Methods Intention Recognition