EXOM: An Excavator Operation Monitoring Framework with Onboard Vision and Sensor Data
Seok-Kyu Kang, Seong-Gye Lee, Gye-Bong Jang
AI summary
Problem
Prior excavator monitoring systems either rely on costly external infrastructure, demand excessive computation that prevents real-time embedded deployment, or fail to accurately count excavation cycles and distinguish fine-grained operational phases.
Approach
The framework splits processing into two modules: a vision module that tracks bucket state transitions from a single cabin camera to count excavations, and a sensor module that uses a learnable adaptive window to sparsify hydraulic signals for classifying non-excavation tasks.
Key results
- Achieves state-of-the-art accuracy in both excavation counting and operation classification
- Delivers real-time end-to-end latency (≤30 ms) on resource-limited NVIDIA Jetson Orin NX hardware
- Eliminates external infrastructure by relying solely on factory-installed cabin cameras and built-in hydraulic sensors
- Introduces EXOM-I, a unified metric balancing section-level F1 score and normalized excavation counting accuracy
Why it matters
Provides a scalable, low-cost solution for real-time heavy machinery monitoring, directly supporting productivity optimization and autonomous construction deployment.
Abstract
Reliable monitoring of excavator operations in real-world environments requires accurate excavation count- ing to ensure productivity, efficient computation for real-time inference, and cost-effective on-board sensing—a combination that most prior systems fail to achieve. We present EXOM (EXcavator Operation Monitoring), a lightweight and deploy- able framework that relies solely on a factory-installed cabin camera and built-in hydraulic sensors. EXOM integrates two embedded-friendly modules: a Video data Processing Module (VPM), where an ECSE algorithm leverages bucket detection to estimate excavation sections and counts from state transi- tions, and a Sensor data Processing Module (SPM), where an Adaptive Window (AW) process sparsifies time-series signals and drives a segmentation model through a learnable sparse tensor. To capture deployability, we introduce EXOM-I, a unified index that combines section-level F1 and normalized ex- cavation counting accuracy. Experiments with real-world data demonstrate that EXOM consistently outperforms previous approaches, achieving state-of-the-art performance with real- time latency on resource-limited embedded excavator hardware.