Stable Worker Intention Recognition Via Transformer and CRF-Ontology Decoding for Human�Robot Collaboration
Hwijin Park, HyunBin Kwon, Hyundo Lee, Cheol Woo Park, HAK YI
AI summary
Problem
Existing intention recognition methods in human-robot collaboration often suffer from temporal prediction instability and logically inconsistent action-tool-part-intention combinations, failing to capture contextual relationships and structural constraints.
Approach
A single-stream Transformer encoder integrates multimodal task context to jointly predict actions, tools, parts, and intentions, followed by CRF decoding for temporal consistency and ontology-based post-processing to enforce structural constraints.
Key results
- Reduced intention change rate from 7.9% to 3.0% via CRF decoding
- Decreased structural violation rate from 26.5% to 6.9% using ontology constraints
- Achieved combined 3.0% change rate and 3.7% violation rate for reliable predictions
- Maintained baseline-level accuracy across all prediction tasks
Why it matters
Enables more reliable and context-aware human-robot collaboration in manufacturing by ensuring temporally stable and structurally consistent intention recognition.
Abstract
This paper proposes a transformer-based single- stream model with CRF–ontology decoding for stable worker intention recognition in human–robot collaboration(HRC). Al- though existing intention recognition methods achieve high accuracy, they often suffer from temporal prediction instability and logically inconsistent combinations among actions, tools, parts, and intentions. To address these issues, the proposed approach employs a transformer encoder to integrate worker actions and part-related information, thereby capturing the task context and jointly predicting actions, tools, parts, and intentions. For intention prediction, a conditional random field (CRF) is applied to enforce temporal consistency and im- prove prediction stability. In addition, an ontology-based post- processing step removes infeasible combinations under a given task intention and reselects predictions that satisfy structural constraints. Experimental results show that the CRF reduces the intention change rate from 7.9% to 3.0%, improving temporal stability, while ontology-based decoding decreases the violation rate from 26.5% to 6.9% by eliminating inconsistent predictions. When combined, the proposed method achieves both a low change rate (3.0%) and a low violation rate (3.7%), demonstrating its effectiveness for reliable intention recognition in HRC.