← Back ICRA 2026

VividFace: Real-Time and Realistic Facial Expression Shadowing for Humanoid Robots

Peizhen Li, Longbing Cao, Xiao-Ming Wu, Yang Zhang

PDF

AI summary

Key figure (auto-extracted from paper)

VividFace enables humanoid robots to realistically mimic human facial expressions in real time with under 0.05-second latency by capturing subtle details like wrinkles and frowning.

facial expression imitation humanoid robots real-time processing affective HRI deep learning X2CNet++

Problem

Existing humanoid facial expression imitation methods fail to simultaneously achieve real-time performance and realistic expressiveness, often missing subtle facial nuances or suffering from high computational latency.

Approach

The authors introduce VividFace, which combines a fine-tuned motion transfer module and a feature-adaptation training strategy within the X2CNet++ framework, alongside an asynchronous video-streaming pipeline to enable low-latency, real-time expression shadowing.

Key results

Achieves expression mimicry within 0.05 seconds latency
Successfully transfers subtle facial details like nose wrinkles and frowning
Generalizes effectively across diverse human facial configurations
Validated through extensive real-world demonstrations on the Ameca humanoid robot

Why it matters

Enables more empathetic and engaging human-robot interaction for social assistance, education, and healthcare applications.

Abstract

Humanoid facial expression shadowing enables robots to realistically imitate human facial expressions in real time, which is critical for lifelike, facially expressive humanoid robots and affective human–robot interaction. Existing progress in humanoid facial expression imitation remains limited, often failing to achieve either real-time performance or realistic expressiveness due to offline video-based inference designs and insufficient ability to capture and transfer subtle expression de- tails. To address these limitations, we present VividFace, a real- time and realistic facial expression shadowing system for hu- manoid robots. An optimized imitation framework X2CNet++ enhances expressiveness by fine-tuning the human-to-humanoid facial motion transfer module and introducing a feature- adaptation training strategy for better alignment across differ- ent image sources. Real-time shadowing is further enabled by a video-stream-compatible inference pipeline and a streamlined workflow based on asynchronous I/O for efficient communica- tion across devices. VividFace produces vivid humanoid faces by mimicking human facial expressions within 0.05 seconds, while generalizing across diverse facial configurations. Extensive real- world demonstrations validate its practical utility. Videos are available at: https://lipzh5.github.io/VividFace/.

Index terms

Gesture Posture and Facial Expressions Emotional Robotics Human and Humanoid Motion Analysis and Synthesis