GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation
Guangqi Jiang, Haoran Chang, Ri-Zhao Qiu, Yutong Liang, Mazeyu Ji, Jiyue Zhu, Xueyan Zou, Zhao Dong, Xiaolong Wang
AI summary
Problem
Training manipulation policies faces a trade-off between simulation's aligned action space but poor visuals, and real-world data's realistic visuals but high cost and scaling limits. Existing simulators also lack photorealistic rendering, metric scale, and reproducible cross-embodiment benchmarking.
Approach
GSWorld reconstructs metric-accurate digital twins from real-world captures using 3D Gaussian Splatting and ArUco markers, then couples photorealistic rendering with physics engines to enable native action-space control and closed-loop policy training.
Key results
- Zero-shot sim-to-real transfer for visual imitation and reinforcement learning
- Automated closed-loop DAgger data collection for continuous policy improvement
- Reproducible visual benchmarking across multiple robot embodiments and tasks
- Scalable parallelized RL training with significantly reduced sim2real visual gaps
Why it matters
Enables robotics researchers to develop, evaluate, and deploy manipulation policies faster and more reliably by eliminating costly real-world data collection and bridging the sim-to-real gap.
Abstract
This paper presents GSWorld, a robust, photo- realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. Our framework advocates ‘closing the loop’ of developing manipulation policies with reproducible evaluation of policies learned from real-robot data and sim2real policy training without using real robots. To enable photo-realistic rendering of diverse scenes, we propose a new asset format, which we term GSDF (Gaussian Scene Description File), that infuses Gaussian-on-Mesh representation with robot URDF and other objects. With a streamlined reconstruction pipeline, we curate a database of GSDF that contains 3 robot embodiments for single-arm and bimanual manipulation, as well as more than 40 objects. Combining GSDF with physics engines, we demonstrate several immediate interesting applications: (1) learning zero-shot sim2real pixel-to- action manipulation policy with photo-realistic rendering, (2) automated high-quality DAgger data collection for adapting policies to deployment environments, (3) reproducible bench- marking of real-robot manipulation policies in simulation, (4) simulation data collection by virtual teleoperation, and (5) zero- shot sim2real visual reinforcement learning.