GSON: A Group-Based Social Navigation Framework with Large Multimodal Model
Shangyi Luo, Peng Sun, Ji Zhu, Yuhong Deng, Cunjun Yu, Anxing Xiao, Xueqian Wang
AI summary
Problem
Current navigation systems excel at obstacle avoidance but fail to recognize complex social contexts in dynamic crowds, often disrupting human interactions. Traditional methods lack the semantic understanding needed to generalize across open-world social scenarios.
Approach
GSON integrates a robust pedestrian tracking pipeline with a Large Multimodal Model that uses visual prompting to zero-shot identify social groups. A mid-level planner dynamically incorporates these group estimates into the cost map, guiding the robot around social spaces while preserving reactive local control.
Key results
- Zero-shot social group detection via LMM visual prompting
- Mid-level planner dynamically avoids social spaces
- Real-world validation in queuing, conversation, and photo scenarios
- Significantly reduced social perturbations with comparable traditional navigation metrics
Why it matters
Enables service robots to operate respectfully and safely in human-centered environments by bridging semantic social understanding with practical navigation control.
Abstract
With the increasing presence of service robots and autonomous vehicles in human environments, navigation systems need to evolve beyond simple destination reach to incorporate social awareness. This paper introduces GSON, a novel group- based social navigation framework that leverages Large Mul- timodal Models (LMMs) to enhance robots’ social perception capabilities. Our approach uses visual prompting to enable zero- shot extraction of social relationships among pedestrians and integrates these results with robust pedestrian detection and tracking pipelines to overcome the inherent inference speed lim- itations of LMMs. The planning system incorporates a mid-level planner that sits between global path planning and local motion planning, effectively preserving both global context and reactive responsiveness while avoiding disruption of the predicted social group. We validate GSON through extensive real-world mobile robot navigation experiments involving complex social scenarios such as queuing, conversations, and photo sessions. Comparative results show that our system significantly outperforms existing navigation approaches in minimizing social perturbations while maintaining comparable performance on traditional navigation metrics.