Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications
Chanwoo Kim, JiHwan Yoon, Hyeonseong Kim, Taemoon Jeong, Changwoo Yoo, Seungbeen Lee, SooHwan Byeon, Hoon Chung, Matthew Pan, Jean Oh, Kyungjae Lee, Sungjoon Choi
AI summary
Problem
Mobile robot navigation in crowded, human-shared environments struggles to balance adaptability to diverse human behaviors with strict safety constraints, as classical methods lack generalization and learning-based methods lack explicit safety guarantees.
Approach
The framework learns a density-based reward map from positive and negative human demonstrations, augments it with rule-based objectives for obstacle avoidance and goal reaching, and distills a sampling-based teacher policy into a compact, uncertainty-aware student policy for real-time deployment.
Key results
- Unified reward formulation combining demonstration-driven density learning with rule-based safety
- Sampling-based lookahead teacher policy for adaptive, safe supervision
- Uncertainty-aware distillation into a compact student policy for real-time deployment
- Consistent success rate and time efficiency gains in elevator co-boarding simulations and real-world trials
Why it matters
It provides a practical, deployable navigation framework for socially aware mobile robots operating in dynamic, human-shared spaces like elevators and crowded corridors.
Abstract
Mobile robot navigation in dynamic human envi- ronments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. To this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based looka- head controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment. A video illustrating this work can be found on our project page https://chanwookim971024.github.io/PioneeR/.