CERNet: Class-Embedding Predictive-Coding RNN for Unified Robot Motion, Recognition, and Confidence Estimation
Hiroki Sawada, Alexandre Pitti, Mathias Quoy
AI summary
Problem
Existing robotic models typically separate motor generation, intention recognition, and confidence estimation into complex multi-module systems, leaving a gap for a unified, parameter-efficient framework validated on physical hardware under real-time disturbances.
Approach
CERNet employs a dynamically updated class-embedding vector within a multi-layer predictive-coding RNN to constrain hidden states for motion generation and optimize them online via prediction-error minimization for real-time recognition and self-evaluation.
Key results
- 76% lower trajectory reproduction error than parameter-matched single-layer baselines
- Autonomous recovery from external perturbations while maintaining motion fidelity
- Online trajectory class inference with 68% Top-1 and 81% Top-2 accuracy
- Intrinsic confidence estimation derived directly from internal prediction errors
Why it matters
Provides a compact, extensible neural framework for robust motor memory and intent-sensitive human-robot collaboration on physical platforms.
Abstract
Robots interacting with humans must not only generate learned movements in real-time, but also infer the intent behind observed behaviors and estimate the confidence of their own inferences. This paper proposes a unified model that achieves all three capabilities within a single hierarchical predictive-coding recurrent neural network equipped with a class embedding vector, CERNet, which leverages a dynamically updated class embedding vector to unify motor generation and recognition. The model operates in two modes: generation and inference. In the generation mode, the class embedding constrains the hidden state dynamics to a class-specific subspace; in the inference mode, it is optimized online to minimize prediction error, enabling real-time recognition. Validated on a humanoid robot across 26 kinesthetically taught alphabets, our hierarchical model achieves 76% lower trajectory reproduction error than a parameter-matched single-layer baseline, maintains motion fidelity under external perturbations, and infers the demonstrated trajectory class online with 68% Top-1 and 81% Top-2 accuracy. Furthermore, internal prediction errors naturally reflect the model’s confidence in its recognition. This integration of robust generation, real-time recognition, and intrinsic uncertainty estimation within a single neural network framework offers a compact and extensible approach to motor memory in physical robots, with potential applications in intent- sensitive human–robot collaboration.