← Back ICRA 2026

Stable Worker Intention Recognition Via Transformer and CRF-Ontology Decoding for Human�Robot Collaboration

Hwijin Park, HyunBin Kwon, Hyundo Lee, Cheol Woo Park, HAK YI

PDF

AI summary

Key figure (auto-extracted from paper)

A Transformer-CRF-ontology framework cuts intention prediction fluctuations by over 60% and eliminates structural inconsistencies, enabling reliable human-robot collaboration.

Human-robot collaboration intention recognition transformer conditional random field ontology decoding temporal stability

Problem

Existing intention recognition methods in human-robot collaboration often suffer from temporal prediction instability and logically inconsistent action-tool-part-intention combinations, failing to capture contextual relationships and structural constraints.

Approach

A single-stream Transformer encoder integrates multimodal task context to jointly predict actions, tools, parts, and intentions, followed by CRF decoding for temporal consistency and ontology-based post-processing to enforce structural constraints.

Key results

Reduced intention change rate from 7.9% to 3.0% via CRF decoding
Decreased structural violation rate from 26.5% to 6.9% using ontology constraints
Achieved combined 3.0% change rate and 3.7% violation rate for reliable predictions
Maintained baseline-level accuracy across all prediction tasks

Why it matters

Enables more reliable and context-aware human-robot collaboration in manufacturing by ensuring temporally stable and structurally consistent intention recognition.

Abstract

This paper proposes a transformer-based single- stream model with CRF–ontology decoding for stable worker intention recognition in human–robot collaboration(HRC). Al- though existing intention recognition methods achieve high accuracy, they often suffer from temporal prediction instability and logically inconsistent combinations among actions, tools, parts, and intentions. To address these issues, the proposed approach employs a transformer encoder to integrate worker actions and part-related information, thereby capturing the task context and jointly predicting actions, tools, parts, and intentions. For intention prediction, a conditional random field (CRF) is applied to enforce temporal consistency and im- prove prediction stability. In addition, an ontology-based post- processing step removes infeasible combinations under a given task intention and reselects predictions that satisfy structural constraints. Experimental results show that the CRF reduces the intention change rate from 7.9% to 3.0%, improving temporal stability, while ontology-based decoding decreases the violation rate from 26.5% to 6.9% by eliminating inconsistent predictions. When combined, the proposed method achieves both a low change rate (3.0%) and a low violation rate (3.7%), demonstrating its effectiveness for reliable intention recognition in HRC.

Index terms

Intention Recognition Multi-Modal Perception for HRI Human-Robot Collaboration