← Back ICRA 2026

CERNet: Class-Embedding Predictive-Coding RNN for Unified Robot Motion, Recognition, and Confidence Estimation

Hiroki Sawada, Alexandre Pitti, Mathias Quoy

PDF

AI summary

Key figure (auto-extracted from paper)

A single hierarchical predictive-coding RNN unifies robust motor generation, real-time intention recognition, and intrinsic confidence estimation on a physical humanoid robot.

Predictive coding Recurrent neural networks Robot motion generation Intention recognition Confidence estimation Humanoid robotics

Problem

Existing robotic models typically separate motor generation, intention recognition, and confidence estimation into complex multi-module systems, leaving a gap for a unified, parameter-efficient framework validated on physical hardware under real-time disturbances.

Approach

CERNet employs a dynamically updated class-embedding vector within a multi-layer predictive-coding RNN to constrain hidden states for motion generation and optimize them online via prediction-error minimization for real-time recognition and self-evaluation.

Key results

76% lower trajectory reproduction error than parameter-matched single-layer baselines
Autonomous recovery from external perturbations while maintaining motion fidelity
Online trajectory class inference with 68% Top-1 and 81% Top-2 accuracy
Intrinsic confidence estimation derived directly from internal prediction errors

Why it matters

Provides a compact, extensible neural framework for robust motor memory and intent-sensitive human-robot collaboration on physical platforms.

Abstract

Robots interacting with humans must not only generate learned movements in real-time, but also infer the intent behind observed behaviors and estimate the confidence of their own inferences. This paper proposes a unified model that achieves all three capabilities within a single hierarchical predictive-coding recurrent neural network equipped with a class embedding vector, CERNet, which leverages a dynamically updated class embedding vector to unify motor generation and recognition. The model operates in two modes: generation and inference. In the generation mode, the class embedding constrains the hidden state dynamics to a class-specific subspace; in the inference mode, it is optimized online to minimize prediction error, enabling real-time recognition. Validated on a humanoid robot across 26 kinesthetically taught alphabets, our hierarchical model achieves 76% lower trajectory reproduction error than a parameter-matched single-layer baseline, maintains motion fidelity under external perturbations, and infers the demonstrated trajectory class online with 68% Top-1 and 81% Top-2 accuracy. Furthermore, internal prediction errors naturally reflect the model’s confidence in its recognition. This integration of robust generation, real-time recognition, and intrinsic uncertainty estimation within a single neural network framework offers a compact and extensible approach to motor memory in physical robots, with potential applications in intent- sensitive human–robot collaboration.

Index terms

Bioinspired Robot Learning Neurorobotics Intention Recognition