← Back ICRA 2026

A Distributed Multi-Modal Sensing Approach for Human Activity Recognition in Real-Time Human-Robot Collaboration

Van Anh Ho, Fulvio Mastrogiovanni

PDF

AI summary

Key figure (auto-extracted from paper)

A late-fusion multi-modal system combining IMU gloves and vision-based tactile sensing achieves high-accuracy, real-time human activity recognition for responsive human-robot collaboration.

Human activity recognition multi-modal sensing human-robot collaboration vision-based tactile sensing IMU data glove real-time classification

Problem

Existing human activity recognition systems struggle to simultaneously capture complex hand kinematics and contact forces in dynamic, real-world human-robot collaboration settings without suffering from occlusion, sensor drift, or latency.

Approach

The authors fuse motion data from a modular IMU data glove with tactile feedback from a vision-based cylindrical sensor using a late-fusion neural network, enabling real-time classification of hand activities during physical interaction.

Key results

94.64% accuracy and 95.60% F1-score in offline classification of 15 distinct hand actions
Robust real-time online classification validated with event-based error metrics under static conditions
Successful dynamic validation where the robot adaptively adjusted its trajectory based on recognized human gestures
Demonstrated viability of late-fusion multi-modal sensing for safe, responsive physical collaboration

Why it matters

Provides a scalable, occlusion-resilient sensing framework that enables robots to safely interpret and dynamically respond to human physical intentions during collaborative tasks.

Abstract

Human activity recognition (HAR) is fundamen- tal in human-robot collaboration (HRC), enabling robots to respond to and dynamically adapt to human intentions. This paper introduces a HAR system combining a modular data glove equipped with Inertial Measurement Units and a vision- based tactile sensor to capture hand activities in contact with a robot. We tested our activity recognition approach under different conditions, including offline classification of segmented sequences, real-time classification under static conditions, and a realistic HRC scenario. The experimental results show a high accuracy for all the tasks, suggesting that multiple collaborative settings could benefit from this multi-modal approach.

Index terms

Multi-Modal Perception for HRI Physical Human-Robot Interaction