Human2Nav: Learning Crowd Navigation from Human Videos across Robots Via Feasibility-Guided Flow Matching
Shenghong Zhang, JunJie Chen, Sichi Yan, Yutong Ban, Xiao Li
AI summary
Problem
Collecting large-scale, safe robot navigation data is costly and unsafe, while directly transferring human video demonstrations to robots fails due to observation and kinematic/dynamic constraints.
Approach
The framework aligns aerial human videos with robot sensors using a bird’s-eye-view representation, trains a conditional flow-matching model to capture crowd dynamics, and applies a training-free feasibility guidance mechanism at test time to enforce robot-specific constraints.
Key results
- Outperforms ORCA and SARL baselines in success rate and data efficiency
- Enables training-free adaptation to heterogeneous robot kinematics and dynamics
- Achieves safe, executable navigation across four diverse crowd scenarios in simulation
- Successfully deployed on differential drive and quadruped robots in real-world crowded environments
Why it matters
Provides a scalable, platform-agnostic solution for robot crowd navigation that eliminates costly teleoperation data collection and accelerates real-world deployment.
Abstract
Enabling robots to navigate safely and efficiently in dynamic, crowded environments requires learning from large-scale demonstrations, which are costly and unsafe to collect on physical platforms. While human videos offer a rich and scalable alternative, transferring these motion patterns to robots is challenged by the embodiment gap across observation and action spaces. This paper presents Human2Nav, a data- efficient framework that learns navigation policies directly from human videos via test-time feasibility-guided flow matching. Human2Nav employs a bird’s-eye-view representation to align visual observations and trains a conditional flow matching model to capture nuanced human navigation patterns. Cru- cially, we introduce a training-free feasibility guidance mech- anism that during inference steers generated trajectories to satisfy heterogeneous robot-specific kinematic and dynamic constraints without retraining. Extensive experiments in sim- ulation and on real-world heterogeneous robotic platforms demonstrate that Human2Nav achieves superior data efficiency and navigation performance compared to model-based and learning-based baselines, while ensuring safe and executable trajectories across diverse crowd scenarios.