← Back ICRA 2026

Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications

Chanwoo Kim, JiHwan Yoon, Hyeonseong Kim, Taemoon Jeong, Changwoo Yoo, Seungbeen Lee, SooHwan Byeon, Hoon Chung, Matthew Pan, Jean Oh, Kyungjae Lee, Sungjoon Choi

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating demonstration-derived rewards with rule-based safety constraints enables mobile robots to navigate dynamic human environments adaptively and safely in real time.

Social navigation Reward learning Rule-based safety Uncertainty estimation Mobile robotics Imitation learning

Problem

Mobile robot navigation in crowded, human-shared environments struggles to balance adaptability to diverse human behaviors with strict safety constraints, as classical methods lack generalization and learning-based methods lack explicit safety guarantees.

Approach

The framework learns a density-based reward map from positive and negative human demonstrations, augments it with rule-based objectives for obstacle avoidance and goal reaching, and distills a sampling-based teacher policy into a compact, uncertainty-aware student policy for real-time deployment.

Key results

Unified reward formulation combining demonstration-driven density learning with rule-based safety
Sampling-based lookahead teacher policy for adaptive, safe supervision
Uncertainty-aware distillation into a compact student policy for real-time deployment
Consistent success rate and time efficiency gains in elevator co-boarding simulations and real-world trials

Why it matters

It provides a practical, deployable navigation framework for socially aware mobile robots operating in dynamic, human-shared spaces like elevators and crowded corridors.

Abstract

Mobile robot navigation in dynamic human envi- ronments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. To this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based looka- head controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment. A video illustrating this work can be found on our project page https://chanwookim971024.github.io/PioneeR/.

Index terms

Learning from Demonstration Reactive and Sensor-Based Planning