Research Analyzer
← Back ICRA 2026

EES: A Data-Driven End-To-End Escorting System Via Spatiotemporal Feature Fusion

Youjin Yu, Junxiang Li, Bowen Li, Tao Wu, Huijing Zhao

PDF

AI summary

Key figure (auto-extracted from paper)
EES dynamically fuses environmental cues with human movement to enable stable, adaptive vehicle escorting, cutting prediction errors by over 40% compared to rigid tracking methods.
Human-robot collaboration autonomous escorting end-to-end navigation cross-modal attention adaptive following spatiotemporal fusion

Problem

Current human-centered following methods rely strictly on tracking human movement, causing erratic vehicle motion during complex maneuvers and inflexible positioning that fails to adapt to dynamic environments.

Approach

The EES architecture uses a cross-modal attention gating network to fuse camera and LiDAR data with human trajectory history, dynamically balancing environmental perception and human intent to generate adaptive vehicle waypoints.

Key results

  • 41.2% overall and 54.5% cornering prediction error reduction
  • Dynamic relative positioning adjustment across diverse scenarios
  • Superior performance over EKF and intention-based baselines in simulation and real-world tests
  • Robust navigation in complex intersections and inconsistent motion scenarios

Why it matters

Provides a robust, adaptive escorting framework critical for enhancing safety and mission effectiveness in high-risk military and civilian unmanned vehicle operations.

Abstract

This letter presents a technique that allows unmanned vehicles to escort a human to their destinations. Current human- centered following methods depend solely on human movement, which presents significant limitations. The complexity of human movement during tactical maneuvers can lead to erratic vehicle motion. Additionally, the static relative positioning between the human and vehicle creates a rigid following pattern, thereby con- straining the vehicle’s ability to dynamically adjust its position for optimal coverage. To address these limitations, we propose a data-driven end-to-end escorting system (EES) that takes into account both environmental information and human movement to achieve adaptive escorting. We propose a soft-coding paradigm to replace the traditional hard-coding intent modeling to address the inconsistency of human intention and vehicle motion, and establish human-scene following through a cross-modal attention gating net- work. We conducted experiments in the CARLA simulation and the real world. The results demonstrate that the proposed EES reduces prediction errors by 41.2% during overall processes and by 54.5% during cornering. Additionally, EES can adapt to various positions and dynamically adjust the relative positions between humans and unmanned systems to adapt to complex scenarios.

Index terms

Human-Robot Collaboration Autonomous Vehicle Navigation Motion and Path Planning

Related papers