Detection of EMU Components Based on Optical Flow Attention Prior and Multi-Modal RGBD RTDETR
Mingjun Cong, Gang Peng, Yongchang Tang, Chaowei Song, Chaoze Wang
AI summary
Problem
Manual inspection of high-speed rail EMU chassis is inefficient and error-prone due to complex backgrounds and diverse, compact components. Existing AI detectors struggle to accurately localize and classify these parts under varying conditions.
Approach
RTDETR-FAMC combines a dual-branch CSwin Transformer for RGB-D feature extraction with Sea-RAFT optical flow to generate dynamic spatial attention masks, enhanced by wavelet-based multi-scale fusion and channel-space attention modules.
Key results
- Achieves 0.952 mAP50 on a custom high-resolution EMU chassis dataset
- Outperforms YOLO series and standard RTDETR by at least 3% in mAP50
- Reduces model parameters to 46.2M while maintaining high detection accuracy
- Effectively localizes and classifies 34 distinct EMU component types across 28 camera positions
Why it matters
Enables safer, faster, and more reliable automated maintenance for high-speed rail networks, reducing reliance on labor-intensive manual inspections.
Abstract
To address challenges in high-speed train inspec- tion such as complex backgrounds, diverse component types, and compact dimensions, this paper proposes a defect detection method called RTDETR-FAMC (RTDETR with Optical Flow Attention and Multimodal CSwin Transformer). The approach integrates RGB images and depth data through a dual-branch CSwin Transformer backbone network that fully utilizes both visual and depth information. At the same time, the improved Sea-RAFT optical flow estimation is combined to generate dynamic spatial prior attention for standard images and test images, so as to guide the network to focus on target regions. A Mask Feature Fusion (MFF) module achieves channel-space attention synergy optimization, while HWD wavelet transform downsampling and CSP-PAC multi-scale feature fusion modules enhance detection accuracy. Experimental results based on a self-built high-speed rail EMU fine-grained scanning dataset (containing 3,881 high-resolution images) demonstrate signifi- cant accuracy improvements compared to mainstream detec- tion algorithms. Compared with YOLO series and standard RTDETR methods, the proposed approach achieves at least 3% improvement in mAP50 metric, validating its effectiveness as a reliable technical solution for intelligent EMU inspection.