HAVEN: Hierarchical Adversary-Aware Visibility-Enabled Navigation with Cover Utilization Using Deep Transformer Q-Networks
Mihir Chauhan, Damon Conover, Aniket Bera
AI summary
Problem
Classical planners and memory-less reinforcement learning fail in partially observable, adversarial environments due to limited fields-of-view, occlusions, and the inability to anticipate enemy detection.
Approach
The method decomposes navigation into a high-level Deep Transformer Q-Network that selects occlusion-aware subgoals from short observation histories, and a low-level potential field controller that executes smooth, reactive waypoint tracking.
Key results
- Higher success rates and faster time-to-goal than classical and RL baselines
- Reduced exposure to adversarial fields-of-view through visibility-aware candidate generation
- Direct 2D-to-3D transfer to Unity-ROS without architectural retraining
- Ablations confirm temporal memory and cover-aware design are critical for partial observability
Why it matters
Enables reliable, stealth-aware autonomous navigation for defense, surveillance, and warehouse robotics operating under uncertainty and adversarial threats.
Abstract
Autonomous navigation in partially observable en- vironments requires agents to reason beyond immediate sensor input, exploit occlusion, and ensure safety while progressing toward a goal. These challenges arise in many robotics domains, from urban driving and warehouse automation to defense and surveillance. Classical path planning approaches and memory- less reinforcement learning often fail under limited fields-of- view (FoVs) and occlusions, committing to unsafe or inefficient maneuvers. We propose a hierarchical navigation framework that integrates a Deep Transformer Q-Network (DTQN) as a high-level subgoal selector with a modular low-level controller for waypoint execution. The DTQN consumes short histories of task-aware features, encoding odometry, goal direction, obstacle proximity, and visibility cues, and outputs Q-values to rank candidate subgoals. Visibility-aware candidate generation introduces masking and exposure penalties, rewarding the use of cover and anticipatory safety. A low-level potential field controller then tracks the selected subgoal, ensuring smooth short-horizon obstacle avoidance. We validate our approach in 2D simulation and extend it directly to a 3D Unity–ROS environment by projecting point-cloud perception into the same feature schema, enabling transfer without architectural changes. Results show consistent improvements over classical planners and RL baselines in success rate, safety margins, and time-to-goal, with ablations confirming the value of temporal memory and visibility-aware candidate design. These findings highlight a generalizable framework for safe navigation under uncertainty, with broad relevance across robotic platforms.