Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
Gabriel Fischer Abati, João Carlos Virgolino Soares, Giulio Turrisi, Victor Barasuol, Claudio Semini
AI summary
Problem
Data-driven contact estimation for quadruped robots typically relies on raw time-series inputs that struggle to efficiently capture complex temporal dynamics and inter-signal correlations, limiting accuracy and generalization across diverse terrains.
Approach
The authors propose Proprioceptive Images (PI), a novel encoding method that converts multi-channel proprioceptive signals into structured 2D images that preserve both temporal dynamics and the robot's physical layout, enabling standard CNNs to extract richer spatial features.
Key results
- Achieves 94.5% contact state accuracy on the Contact Dataset, surpassing the MI-HGNN baseline (87.7%)
- Requires a 15× shorter temporal window size than competing sequence-based methods
- Demonstrates consistent accuracy and generalization improvements across simulated and real-world quadruped locomotion datasets
- Introduces ConcatCNN, a multi-image CNN architecture that effectively fuses the proposed PI representations for contact classification
Why it matters
This cross-modal encoding strategy provides a robust, morphology-aware feature space that enhances robotic perception, enabling more stable and adaptive locomotion on complex terrains.
Abstract
This paper presents a novel approach for repre- senting proprioceptive time-series data from quadruped robots as structured two-dimensional images, enabling the use of convolutional neural networks for learning locomotion-related tasks. The proposed method encodes temporal dynamics from multiple proprioceptive signals, such as joint positions, IMU readings, and foot velocities, while preserving the robot’s mor- phological structure in the spatial arrangement of the image. This transformation captures inter-signal correlations and gait- dependent patterns, providing a richer feature space than direct time-series processing. We apply this concept in the problem of contact estimation, a key capability for stable and adaptive lo- comotion on diverse terrains. Experimental evaluations on both real-world datasets and simulated environments show that our image-based representation consistently enhances prediction accuracy and generalization over conventional sequence-based models, underscoring the potential of cross-modal encoding strategies for robotic state learning. Our method achieves superior performance on the contact dataset, improving contact state accuracy from 87.7% to 94.5% over the recently proposed MI-HGNN method, using a 15 times shorter window size.