Research Analyzer
← Back ICRA 2026

EXOM: An Excavator Operation Monitoring Framework with Onboard Vision and Sensor Data

Seok-Kyu Kang, Seong-Gye Lee, Gye-Bong Jang

PDF

AI summary

Key figure (auto-extracted from paper)
EXOM enables accurate, real-time excavator operation monitoring and excavation counting using only factory-installed cameras and hydraulic sensors on embedded hardware.
Excavator monitoring Onboard vision Sensor fusion Real-time inference Embedded systems Construction automation

Problem

Prior excavator monitoring systems either rely on costly external infrastructure, demand excessive computation that prevents real-time embedded deployment, or fail to accurately count excavation cycles and distinguish fine-grained operational phases.

Approach

The framework splits processing into two modules: a vision module that tracks bucket state transitions from a single cabin camera to count excavations, and a sensor module that uses a learnable adaptive window to sparsify hydraulic signals for classifying non-excavation tasks.

Key results

  • Achieves state-of-the-art accuracy in both excavation counting and operation classification
  • Delivers real-time end-to-end latency (≤30 ms) on resource-limited NVIDIA Jetson Orin NX hardware
  • Eliminates external infrastructure by relying solely on factory-installed cabin cameras and built-in hydraulic sensors
  • Introduces EXOM-I, a unified metric balancing section-level F1 score and normalized excavation counting accuracy

Why it matters

Provides a scalable, low-cost solution for real-time heavy machinery monitoring, directly supporting productivity optimization and autonomous construction deployment.

Abstract

Reliable monitoring of excavator operations in real-world environments requires accurate excavation count- ing to ensure productivity, efficient computation for real-time inference, and cost-effective on-board sensing—a combination that most prior systems fail to achieve. We present EXOM (EXcavator Operation Monitoring), a lightweight and deploy- able framework that relies solely on a factory-installed cabin camera and built-in hydraulic sensors. EXOM integrates two embedded-friendly modules: a Video data Processing Module (VPM), where an ECSE algorithm leverages bucket detection to estimate excavation sections and counts from state transi- tions, and a Sensor data Processing Module (SPM), where an Adaptive Window (AW) process sparsifies time-series signals and drives a segmentation model through a learnable sparse tensor. To capture deployability, we introduce EXOM-I, a unified index that combines section-level F1 and normalized ex- cavation counting accuracy. Experiments with real-world data demonstrate that EXOM consistently outperforms previous approaches, achieving state-of-the-art performance with real- time latency on resource-limited embedded excavator hardware.

Index terms

Robotics and Automation in Construction Sensor Fusion Deep Learning Methods

Related papers