← Back ICRA 2026

Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning

Gabriel Fischer Abati, João Carlos Virgolino Soares, Giulio Turrisi, Victor Barasuol, Claudio Semini

PDF

AI summary

Key figure (auto-extracted from paper)

Transforming quadruped proprioceptive time-series data into morphology-aware 2D images boosts foot contact estimation accuracy and generalization over conventional sequence-based models.

Proprioceptive imaging contact estimation quadruped robots CNN time-series to image robotic perception

Problem

Data-driven contact estimation for quadruped robots typically relies on raw time-series inputs that struggle to efficiently capture complex temporal dynamics and inter-signal correlations, limiting accuracy and generalization across diverse terrains.

Approach

The authors propose Proprioceptive Images (PI), a novel encoding method that converts multi-channel proprioceptive signals into structured 2D images that preserve both temporal dynamics and the robot's physical layout, enabling standard CNNs to extract richer spatial features.

Key results

Achieves 94.5% contact state accuracy on the Contact Dataset, surpassing the MI-HGNN baseline (87.7%)
Requires a 15× shorter temporal window size than competing sequence-based methods
Demonstrates consistent accuracy and generalization improvements across simulated and real-world quadruped locomotion datasets
Introduces ConcatCNN, a multi-image CNN architecture that effectively fuses the proposed PI representations for contact classification

Why it matters

This cross-modal encoding strategy provides a robust, morphology-aware feature space that enhances robotic perception, enabling more stable and adaptive locomotion on complex terrains.

Abstract

This paper presents a novel approach for repre- senting proprioceptive time-series data from quadruped robots as structured two-dimensional images, enabling the use of convolutional neural networks for learning locomotion-related tasks. The proposed method encodes temporal dynamics from multiple proprioceptive signals, such as joint positions, IMU readings, and foot velocities, while preserving the robot’s mor- phological structure in the spatial arrangement of the image. This transformation captures inter-signal correlations and gait- dependent patterns, providing a richer feature space than direct time-series processing. We apply this concept in the problem of contact estimation, a key capability for stable and adaptive lo- comotion on diverse terrains. Experimental evaluations on both real-world datasets and simulated environments show that our image-based representation consistently enhances prediction accuracy and generalization over conventional sequence-based models, underscoring the potential of cross-modal encoding strategies for robotic state learning. Our method achieves superior performance on the contact dataset, improving contact state accuracy from 87.7% to 94.5% over the recently proposed MI-HGNN method, using a 15 times shorter window size.

Index terms

Legged Robots AI-Based Methods