← Back ICRA 2026

Exploring History-Aware Online Actor-Critic for Smart Manufacturing Tasks in the RICAIP Testbed

Tomas Horelican

PDF

AI summary

Key figure (auto-extracted from paper)

Augmenting online RL policies with a lightweight transformer to encode past states improves performance in sparse-reward and complex control tasks.

reinforcement learning history-aware policies transformer online learning smart manufacturing control

Problem

Standard online reinforcement learning policies assume a Markovian property by processing only the current state, ignoring historical context that is often critical for complex, partially observable manufacturing and robotics tasks.

Approach

The authors replace the standard single-state input with a tiny non-causal transformer encoder that explicitly processes a fixed history of past states, creating a Latent Temporal Transformer architecture integrated with SAC, TD3, and DIPO.

Key results

SAC performance improves notably in sparse-reward exploration tasks
TD3 shows reduced compatibility with sparse rewards but benefits in dense settings
DIPO performance remains stable across both standard and history-aware parametrizations
History-aware policies exhibit more efficient, less redundant motion in manipulation tasks

Why it matters

Provides a simple, computationally efficient architectural upgrade that enhances online RL robustness for real-world robotics and smart manufacturing applications.

Abstract

As manufacturing capabilities advance to greater autonomy, interest is increasingly directed toward versatile agents capable of performing complex tasks. Recently, learning- based approaches have shown more rapid progress compared to classical methods. While these advancements are enabled by the offline setting of Imitation Learning (IL), transfer to pure online exploration Reinforcement Learning (RL) remains less explored. This work experiments with a simple extension to the standard Markovian MLP policy by explicitly encoding a history of states using a tiny transformer model.

Index terms

Reinforcement Learning Machine Learning for Robot Control Autonomous Agents