EES: A Data-Driven End-To-End Escorting System Via Spatiotemporal Feature Fusion
Youjin Yu, Junxiang Li, Bowen Li, Tao Wu, Huijing Zhao
AI summary
Problem
Current human-centered following methods rely strictly on tracking human movement, causing erratic vehicle motion during complex maneuvers and inflexible positioning that fails to adapt to dynamic environments.
Approach
The EES architecture uses a cross-modal attention gating network to fuse camera and LiDAR data with human trajectory history, dynamically balancing environmental perception and human intent to generate adaptive vehicle waypoints.
Key results
- 41.2% overall and 54.5% cornering prediction error reduction
- Dynamic relative positioning adjustment across diverse scenarios
- Superior performance over EKF and intention-based baselines in simulation and real-world tests
- Robust navigation in complex intersections and inconsistent motion scenarios
Why it matters
Provides a robust, adaptive escorting framework critical for enhancing safety and mission effectiveness in high-risk military and civilian unmanned vehicle operations.
Abstract
This letter presents a technique that allows unmanned vehicles to escort a human to their destinations. Current human- centered following methods depend solely on human movement, which presents significant limitations. The complexity of human movement during tactical maneuvers can lead to erratic vehicle motion. Additionally, the static relative positioning between the human and vehicle creates a rigid following pattern, thereby con- straining the vehicle’s ability to dynamically adjust its position for optimal coverage. To address these limitations, we propose a data-driven end-to-end escorting system (EES) that takes into account both environmental information and human movement to achieve adaptive escorting. We propose a soft-coding paradigm to replace the traditional hard-coding intent modeling to address the inconsistency of human intention and vehicle motion, and establish human-scene following through a cross-modal attention gating net- work. We conducted experiments in the CARLA simulation and the real world. The results demonstrate that the proposed EES reduces prediction errors by 41.2% during overall processes and by 54.5% during cornering. Additionally, EES can adapt to various positions and dynamically adjust the relative positions between humans and unmanned systems to adapt to complex scenarios.