RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI
Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Haodong Xiang, Zhengbin Long, Jun Xiong, Rong Shi, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Jun Huang, Bin Chang, Shuai Feng, Tao Shen
AI summary
Problem
Humanoid VLA research is hindered by high data acquisition costs, a lack of standardized benchmarks, and a significant reality gap between simulation and real-world deployment.
Approach
The authors built RealMirror, an open-source platform that integrates VR teleoperation-based data collection, a unified training/inference framework, a multi-scenario benchmark, and a Sim2Real pipeline using generative models and 3D Gaussian Splatting to create photorealistic digital twins.
Key results
- Low-cost, end-to-end data collection and training system
- Open-source humanoid VLA benchmark with 1,200 trajectories across five scenarios
- Zero-shot Sim2Real transfer enabling simulation-trained models to control real robots without fine-tuning
- Photorealistic environment and robot reconstruction via 3D Gaussian Splatting and generative models
Why it matters
It provides researchers and developers with a unified, reproducible framework to accelerate the development and deployment of general-purpose humanoid robots.
Abstract
The emerging field of Vision-Language-Action (VLA) for humanoid robots faces several fundamental chal- lenges, including the high cost of data acquisition, the lack of a standardized benchmark, and the significant gap between simulation and the real world. To overcome these obstacles, we propose RealMirror, a comprehensive, open-source embodied AI VLA platform. RealMirror builds an efficient, low-cost data collection, model training, and inference system that enables end-to-end VLA research without requiring a real robot. To facilitate model evolution and fair comparison, we also introduce a dedicated VLA benchmark for humanoid robots, featuring multiple scenarios, extensive trajectories, and various VLA models. Furthermore, by integrating generative models and 3D Gaussian Splatting to reconstruct realistic environments and robot models, we successfully demonstrate † Equal contribution * Corresponding author. Emails: shen.tao5@zte.com.cn Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Haodong Xiang, Zhengbin Long, Rong Shi, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Jun Huang, Bin Chang, Shuai Feng, Tao Shen are with ZTE Corporation, China. Jun Xiong is with The Chinese University of Hong Kong, Shenzhen, China. zero-shot Sim2Real transfer, where models trained exclusively on simulation data can perform tasks on a real robot seam- lessly, without any fine-tuning. In conclusion, with the uni- fication of these critical components, RealMirror provides a robust framework that significantly accelerates the develop- ment of VLA models for humanoid robots. Project page: https://terminators2025.github.io/RealMirror.github.io