← Back ICRA 2026

Human2Nav: Learning Crowd Navigation from Human Videos across Robots Via Feasibility-Guided Flow Matching

Shenghong Zhang, JunJie Chen, Sichi Yan, Yutong Ban, Xiao Li

PDF

AI summary

Key figure (auto-extracted from paper)

Robots can learn safe, efficient crowd navigation directly from passive human videos using a training-free feasibility-guided flow matching framework that outperforms traditional baselines.

crowd navigation flow matching human video imitation feasibility guidance robot learning sim-to-real transfer

Problem

Collecting large-scale, safe robot navigation data is costly and unsafe, while directly transferring human video demonstrations to robots fails due to observation and kinematic/dynamic constraints.

Approach

The framework aligns aerial human videos with robot sensors using a bird’s-eye-view representation, trains a conditional flow-matching model to capture crowd dynamics, and applies a training-free feasibility guidance mechanism at test time to enforce robot-specific constraints.

Key results

Outperforms ORCA and SARL baselines in success rate and data efficiency
Enables training-free adaptation to heterogeneous robot kinematics and dynamics
Achieves safe, executable navigation across four diverse crowd scenarios in simulation
Successfully deployed on differential drive and quadruped robots in real-world crowded environments

Why it matters

Provides a scalable, platform-agnostic solution for robot crowd navigation that eliminates costly teleoperation data collection and accelerates real-world deployment.

Abstract

Enabling robots to navigate safely and efficiently in dynamic, crowded environments requires learning from large-scale demonstrations, which are costly and unsafe to collect on physical platforms. While human videos offer a rich and scalable alternative, transferring these motion patterns to robots is challenged by the embodiment gap across observation and action spaces. This paper presents Human2Nav, a data- efficient framework that learns navigation policies directly from human videos via test-time feasibility-guided flow matching. Human2Nav employs a bird’s-eye-view representation to align visual observations and trains a conditional flow matching model to capture nuanced human navigation patterns. Cru- cially, we introduce a training-free feasibility guidance mech- anism that during inference steers generated trajectories to satisfy heterogeneous robot-specific kinematic and dynamic constraints without retraining. Extensive experiments in sim- ulation and on real-world heterogeneous robotic platforms demonstrate that Human2Nav achieves superior data efficiency and navigation performance compared to model-based and learning-based baselines, while ensuring safe and executable trajectories across diverse crowd scenarios.

Index terms

Imitation Learning Motion and Path Planning Transfer Learning