← Back ICRA 2026

RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI

Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Haodong Xiang, Zhengbin Long, Jun Xiong, Rong Shi, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Jun Huang, Bin Chang, Shuai Feng, Tao Shen

PDF

AI summary

Key figure (auto-extracted from paper)

RealMirror enables zero-shot Sim2Real transfer for humanoid VLA models by combining photorealistic simulation with an open-source benchmark and end-to-end pipeline, eliminating the need for real-robot fine-tuning.

Vision-Language-Action Sim2Real Humanoid Robots 3D Gaussian Splatting Embodied AI Open-Source Benchmark

Problem

Humanoid VLA research is hindered by high data acquisition costs, a lack of standardized benchmarks, and a significant reality gap between simulation and real-world deployment.

Approach

The authors built RealMirror, an open-source platform that integrates VR teleoperation-based data collection, a unified training/inference framework, a multi-scenario benchmark, and a Sim2Real pipeline using generative models and 3D Gaussian Splatting to create photorealistic digital twins.

Key results

Low-cost, end-to-end data collection and training system
Open-source humanoid VLA benchmark with 1,200 trajectories across five scenarios
Zero-shot Sim2Real transfer enabling simulation-trained models to control real robots without fine-tuning
Photorealistic environment and robot reconstruction via 3D Gaussian Splatting and generative models

Why it matters

It provides researchers and developers with a unified, reproducible framework to accelerate the development and deployment of general-purpose humanoid robots.

Abstract

The emerging field of Vision-Language-Action (VLA) for humanoid robots faces several fundamental chal- lenges, including the high cost of data acquisition, the lack of a standardized benchmark, and the significant gap between simulation and the real world. To overcome these obstacles, we propose RealMirror, a comprehensive, open-source embodied AI VLA platform. RealMirror builds an efficient, low-cost data collection, model training, and inference system that enables end-to-end VLA research without requiring a real robot. To facilitate model evolution and fair comparison, we also introduce a dedicated VLA benchmark for humanoid robots, featuring multiple scenarios, extensive trajectories, and various VLA models. Furthermore, by integrating generative models and 3D Gaussian Splatting to reconstruct realistic environments and robot models, we successfully demonstrate † Equal contribution * Corresponding author. Emails: shen.tao5@zte.com.cn Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Haodong Xiang, Zhengbin Long, Rong Shi, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Jun Huang, Bin Chang, Shuai Feng, Tao Shen are with ZTE Corporation, China. Jun Xiong is with The Chinese University of Hong Kong, Shenzhen, China. zero-shot Sim2Real transfer, where models trained exclusively on simulation data can perform tasks on a real robot seam- lessly, without any fine-tuning. In conclusion, with the uni- fication of these critical components, RealMirror provides a robust framework that significantly accelerates the develop- ment of VLA models for humanoid robots. Project page: https://terminators2025.github.io/RealMirror.github.io

Index terms

Software Tools for Benchmarking and Reproducibility Deep Learning in Grasping and Manipulation Simulation and Animation