MFCC Inspired Spectral Feature Extraction for Robust Touch Interaction in Social_Robots
JiSoo Kim, Sun Jun Hwang, Hyojin Kim, Dong Joon Hwang, Hui Sung Lee
AI summary
Problem
Conventional touch pattern recognition struggles with inter-user variability and often relies on bulky, expensive sensor arrays. Simple capacitive sensors offer a low-cost alternative but lack the robustness needed for reliable classification across diverse users.
Approach
The authors adapt Mel-Frequency Cepstral Coefficients (MFCC) from speech analysis to capacitive touch signals by replacing the auditory Mel scale with a novel, data-driven frequency reference axis tailored to touch sensor characteristics. These compact spectral features are then classified using lightweight machine learning models.
Key results
- Achieved 94–98% accuracy and F1-scores across six social touch patterns on internal and external datasets
- Demonstrated strong subject-independent generalization with ~93% mean accuracy in leave-one-subject-out validation
- Enabled real-time on-device inference on an STM32 microcontroller with only 164 µs/frame latency and 624 Bytes memory usage
- Outperformed baseline models using raw or rFFT inputs, particularly in cross-user generalization
Why it matters
Enables reliable, low-cost touch interaction in real-time social robots without requiring complex hardware or heavy computational resources.
Abstract
Touch is a fundamental modality for conveying emotions and intentions in Human–Robot Interaction. However, conventional approaches to touch pattern recognition often lack robustness to inter-user variability, whereas alternative solutions are frequently bulky or costly. This study proposes a novel feature extraction framework for touch pattern recogni- tion, which adapts MFCC from speech processing to capacitive touch signals. The proposed method preserves the strengths of MFCC—dimensionality reduction and noise robustness—while addressing the physical differences between audio and touch signals by introducing a new frequency reference axis in place of the conventional Mel scale. To evaluate its effectiveness, a representative set of social touch patterns, including gestures traditionally difficult to classify, was defined and analyzed. The proposed framework ensures stable recognition across diverse users while reducing feature dimensionality for efficient operation in lightweight models. This efficiency highlights its suitability for real-time robotic interfaces