Latent Representations for Visual Proprioception in Inexpensive Robots
Sahara Sheikholeslami, Ladislau Bölöni
AI summary
Problem
Inexpensive robots often lack reliable internal sensors for joint position tracking, yet visual proprioception typically requires calibrated cameras, depth sensors, or simulators. This paper investigates how accurately a fast, single-pass model can recover robot configuration from a single uncalibrated RGB image under these resource-constrained conditions.
Approach
The authors evaluate four compact latent encoding methods—Conv-VAEs, fine-tuned CNN and ViT backbones, and uncalibrated fiducial marker detections—to compress a single robot image into a low-dimensional vector that feeds a simple MLP regressor for joint angle prediction.
Key results
- Proposed four latent encoding techniques (Conv-VAE, fine-tuned CNN/ViT backbones, uncalibrated fiducial markers)
- Introduced a universal, size-agnostic MLP regressor requiring only minimal supervised fine-tuning
- Demonstrated component-specific accuracy variations across nine models and two latent sizes
- Revealed distinct error and noise patterns to guide encoder selection for specific pose metrics
Why it matters
This work enables affordable robots to reliably estimate their own pose using minimal hardware, expanding the applicability of vision-based control in unstructured environments.
Abstract
Robotic manipulation requires explicit or implicit knowledge of the robot’s joint positions. Precise proprioception is standard in high-quality industrial robots but is often unavailable in inexpensive robots operating in unstructured environments. In this paper, we ask: to what extent can a fast, single-pass regression architecture perform visual pro- prioception from a single external camera image, available even in the simplest manipulation settings? We explore several latent representations, including CNNs, VAEs, ViTs, and bags of uncalibrated fiducial markers, using fine-tuning techniques adapted to the limited data available. We evaluate the achiev- able accuracy through experiments on an inexpensive 6-DoF robot.