VividFace: Real-Time and Realistic Facial Expression Shadowing for Humanoid Robots
Peizhen Li, Longbing Cao, Xiao-Ming Wu, Yang Zhang
AI summary
Problem
Existing humanoid facial expression imitation methods fail to simultaneously achieve real-time performance and realistic expressiveness, often missing subtle facial nuances or suffering from high computational latency.
Approach
The authors introduce VividFace, which combines a fine-tuned motion transfer module and a feature-adaptation training strategy within the X2CNet++ framework, alongside an asynchronous video-streaming pipeline to enable low-latency, real-time expression shadowing.
Key results
- Achieves expression mimicry within 0.05 seconds latency
- Successfully transfers subtle facial details like nose wrinkles and frowning
- Generalizes effectively across diverse human facial configurations
- Validated through extensive real-world demonstrations on the Ameca humanoid robot
Why it matters
Enables more empathetic and engaging human-robot interaction for social assistance, education, and healthcare applications.
Abstract
Humanoid facial expression shadowing enables robots to realistically imitate human facial expressions in real time, which is critical for lifelike, facially expressive humanoid robots and affective human–robot interaction. Existing progress in humanoid facial expression imitation remains limited, often failing to achieve either real-time performance or realistic expressiveness due to offline video-based inference designs and insufficient ability to capture and transfer subtle expression de- tails. To address these limitations, we present VividFace, a real- time and realistic facial expression shadowing system for hu- manoid robots. An optimized imitation framework X2CNet++ enhances expressiveness by fine-tuning the human-to-humanoid facial motion transfer module and introducing a feature- adaptation training strategy for better alignment across differ- ent image sources. Real-time shadowing is further enabled by a video-stream-compatible inference pipeline and a streamlined workflow based on asynchronous I/O for efficient communica- tion across devices. VividFace produces vivid humanoid faces by mimicking human facial expressions within 0.05 seconds, while generalizing across diverse facial configurations. Extensive real- world demonstrations validate its practical utility. Videos are available at: https://lipzh5.github.io/VividFace/.