STAF-Navi: Vision-Based Spatio-Temporal Attention Fusion Navigation Framework
haowen zhang, fanghong liu, chaoyu zhang, qiuze yu
AI summary
Problem
Current deep reinforcement learning agents for UAVs lack long-term memory, causing them to struggle with partial observability, target retention, and dynamic obstacle avoidance in cluttered environments.
Approach
STAF-Navi fuses historical depth images and flight states using a Transformer-based actor and GRU-based critic, while a deep collision encoder compresses spatial data into latent obstacle representations for real-time decision-making.
Key results
- 10% increase in simulation navigation success rate
- 7% improvement in path efficiency
- Optimal temporal window of H=20 with exponential weighting
- Successful sim-to-real deployment in real-world UAV tests
Why it matters
Provides a robust, memory-augmented navigation solution for autonomous UAVs operating in complex, dynamic, and partially observable environments.
Abstract
In cluttered, unknown, and partially observable envi- ronments, Uncrewed Aerial Vehicle (UAV) navigation encounters formidable challenges. To address these challenges, we propose an innovative spatio-temporal attention fusion navigation framework called STAF-Navi. The framework integrates spatio-temporal at- tention mechanisms to model sequential dependencies. It captures spatial and temporal correlations from historical observations and actions to improve navigation and obstacle avoidance. STAF-Navi employs deep collision encoding to compress high-dimensional depth images into informative low-dimensional latent states, and a single-site Transformer to model historical sensor inputs and states, enhancing the utility of current observations. By exploiting tempo- ral dependencies, this integration enables early braking and stable hovering. Extensive simulation experiments show that the frame- work increases the navigation success rate by 10% and improves path efficiency by 7%. Finally, the successful deployment of the proposed strategy in real-world scenarios validates its effectiveness.