← Back ICRA 2026

SPLC: Social Preference Learning for Crowd Robot Navigation

Zixuan Chen, Hao Fu, Haiwen Hu, Shiquan Zheng

PDF

AI summary

Key figure (auto-extracted from paper)

SPLC automatically learns socially compliant navigation rewards from trajectory preferences, eliminating manual reward design and outperforming baselines in simulation and real-world tests.

Crowd robot navigation offline reinforcement learning social preference learning reward modeling human-robot interaction preference transformer

Problem

Designing effective, socially compliant reward functions for offline reinforcement learning in crowd robot navigation is difficult due to unpredictable pedestrian dynamics and the high cost/subjectivity of manual reward engineering or human preference labeling.

Approach

SPLC introduces a social preference feedback mechanism that automatically generates preference labels using quantified social norms (collision avoidance, goal progress, risk exposure), which trains a preference transformer to model rewards for offline RL algorithms.

Key results

Automatically generates preference labels without human annotators using principled evaluation criteria.
Successfully models implicit social norms and mitigates reward bias in offline RL navigation.
Consistently outperforms state-of-the-art baselines across standard simulation metrics.
Validates effectiveness in real-world experiments on a TurtleBot4 platform.

Why it matters

Enables safe, socially compliant robot navigation in dynamic human environments without costly manual reward design or human-in-the-loop training.

Abstract

Offline reinforcement learning (RL) holds signifi- cant potential for crowd robot navigation in human-robot coex- istence applications. However, the inherent complexity of pedes- trian motion renders the design of effective reward functions for promoting socially compliant robot behaviors a persistent challenge. This paper proposes a Social Preference Learning for Crowd Robot Navigation (SPLC) algorithm to eliminate the need for detailed reward design. Its core innovation lies in the introduction of a social preference feedback mechanism to automatically generate preference data through principled preference evaluation criteria. By explicitly accounting for the intricacies of pedestrian dynamics, the pipeline mitigates the reward bias and facilitates the systematic quantification of broad social norms, thereby fostering socially compliant behaviors. Extensive experiments integrating SPLC with of- fline RL methods demonstrate consistent improvements over state-of-the-art baselines across standard performance metrics. Furthermore, real-world experiments on the TurtleBot4 further validate the effectiveness of SPLC in practical human–robot coexistence settings. Our code and video demos are available at https://github.com/sklus949/SPLC.

Index terms

Motion and Path Planning Reinforcement Learning Collision Avoidance