← Back ICRA 2026

Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

Hoseong Jung, Sungil Son, Daesol Cho, Jonghae Park, Changhyun Choi, H. Jin Kim

PDF

AI summary

Key figure (auto-extracted from paper)

TART leverages contrastive learning and vector quantization to capture causal, multi-modal dependencies between discrete resource usage and continuous maneuvers, outperforming baselines in resource-constrained tactical tasks.

Temporal action representation Hybrid action spaces Contrastive learning Vector quantization Tactical decision-making Resource-constrained control

Problem

Prior hybrid-action reinforcement learning methods overlook the causal dependencies between discrete resource decisions and continuous maneuvers, while failing to capture the multi-modal nature of tactical decisions in dynamic environments.

Approach

TART maximizes a mutual information objective via trajectory-level contrastive learning to align state-action contexts with future maneuvers, then quantizes these representations into a tactical codebook that conditions a factorized policy for diverse maneuver generation.

Key results

Consistently outperforms hybrid-action baselines in maze navigation and air combat simulators
Captures causal dependencies between discrete resource deployment and subsequent maneuvers
Generates temporally coherent and multi-modal tactical behaviors under strict resource budgets
Integrates into standard on-policy RL loops with improved stability and sample efficiency

Why it matters

Provides a scalable framework for resource-aware tactical decision-making in dynamic robotic and autonomous systems where traditional RL struggles with hybrid action spaces.

Abstract

Autonomous robotic systems should reason about resource control and its impact on subsequent maneuvers, especially when operating with limited energy budgets or re- stricted sensing. Learning-based control is effective in handling complex dynamics and represents the problem as a hybrid action space unifying discrete resource usage and continuous maneuvers. However, prior works on hybrid action space have not sufficiently captured the causal dependencies between resource usage and maneuvers. They have also overlooked the multi-modal nature of tactical decisions, both of which are critical in fast-evolving scenarios. In this paper, we propose TART, a Temporal Action Representation learning framework for Tactical resource control and subsequent maneuver gener- ation. TART leverages contrastive learning based on a mutual information objective, designed to capture inherent temporal dependencies in resource-maneuver interactions. These learned representations are quantized into discrete codebook entries that condition the policy, capturing recurring tactical patterns and enabling multi-modal and temporally coherent behaviors. We evaluate TART in two domains where resource deployment is critical: (i) a maze navigation task where a limited budget of discrete actions provides enhanced mobility, and (ii) a high- fidelity air combat simulator in which an F-16 agent operates weapons and defensive systems in coordination with flight maneuvers. Across both domains, TART consistently outper- forms hybrid-action baselines, demonstrating its effectiveness in leveraging limited resources and producing context-aware subsequent maneuvers.

Index terms

Reinforcement Learning Robotics in Under-Resourced Settings Autonomous Agents