Semantically-Aware Diver Activity Recognition Framework for Effective Underwater Multi-Human-Robot Collaboration
Sadman Sakib Enan, Junaed Sattar
AI summary
Problem
Autonomous underwater vehicles struggle to recognize diver activities in low-visibility environments due to a severe lack of large-scale datasets, hindering safe and effective human-robot collaboration.
Approach
The authors propose DAR-Net, a transformer-based framework that jointly optimizes activity classification with pixel-level semantic segmentation to focus on relevant divers and robots, alongside the first Underwater Diver Activity (UDA) dataset of 2,640 annotated images.
Key results
- 73.33% classification accuracy, outperforming state-of-the-art baselines
- Release of the UDA dataset with 2,640 pixel-level annotated images across six activity categories
- Semantic supervision significantly improves model attention on relevant scene elements
- Robust performance across precision, recall, and F1-score metrics on held-out test data
Why it matters
Provides the foundational dataset and recognition capability necessary for advancing safe, real-time collaboration between human divers and autonomous underwater vehicles.
Abstract
Effective multi-human-robot collaboration is es- sential for expanding human-led operations in the challeng- ing and high-risk underwater environment. For autonomous underwater vehicles (AUVs) to become true teammates, they must be able to comprehend their surroundings and recognize a diver’s activities to offer assistance and ensure safety. Towards this goal, we introduce DAR-Net, a novel transformer-based framework that analyzes complex underwater scenes to classify diver activities. Our contribution lies in a semantically guided learning formulation that couples transformer-based temporal reasoning with pixel-level scene supervision. This multi-loss training strategy explicitly aligns global activity recognition with local human–robot interaction semantics, which is particu- larly critical in low-visibility underwater conditions. To address the significant challenge of data scarcity in this domain, we present the first-ever Underwater Diver Activity (UDA) dataset, a foundational resource containing over 2, 600 annotated images with pixel-level masks. Through rigorous experimental evalua- tions in a controlled environment, we demonstrate that DAR- Net achieves promising accuracy in recognizing six distinct diver activities, outperforming state-of-the-art models. While this dataset provides a crucial baseline, our work serves as a pioneering step, laying the groundwork for future research and facilitating the development of more intelligent, collaborative underwater robotic systems.