← Back ICRA 2026

GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation

Guangqi Jiang, Haoran Chang, Ri-Zhao Qiu, Yutong Liang, Mazeyu Ji, Jiyue Zhu, Xueyan Zou, Zhao Dong, Xiaolong Wang

PDF

AI summary

Key figure (auto-extracted from paper)

GSWorld enables zero-shot sim-to-real transfer and closed-loop policy improvement for robot manipulation by coupling 3D Gaussian Splatting with physics in a photorealistic, metric-accurate simulation.

3D Gaussian Splatting Sim-to-Real Transfer Robot Manipulation Photorealistic Simulation Closed-Loop Learning Digital Twin

Problem

Training manipulation policies faces a trade-off between simulation's aligned action space but poor visuals, and real-world data's realistic visuals but high cost and scaling limits. Existing simulators also lack photorealistic rendering, metric scale, and reproducible cross-embodiment benchmarking.

Approach

GSWorld reconstructs metric-accurate digital twins from real-world captures using 3D Gaussian Splatting and ArUco markers, then couples photorealistic rendering with physics engines to enable native action-space control and closed-loop policy training.

Key results

Zero-shot sim-to-real transfer for visual imitation and reinforcement learning
Automated closed-loop DAgger data collection for continuous policy improvement
Reproducible visual benchmarking across multiple robot embodiments and tasks
Scalable parallelized RL training with significantly reduced sim2real visual gaps

Why it matters

Enables robotics researchers to develop, evaluate, and deploy manipulation policies faster and more reliably by eliminating costly real-world data collection and bridging the sim-to-real gap.

Abstract

This paper presents GSWorld, a robust, photo- realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. Our framework advocates ‘closing the loop’ of developing manipulation policies with reproducible evaluation of policies learned from real-robot data and sim2real policy training without using real robots. To enable photo-realistic rendering of diverse scenes, we propose a new asset format, which we term GSDF (Gaussian Scene Description File), that infuses Gaussian-on-Mesh representation with robot URDF and other objects. With a streamlined reconstruction pipeline, we curate a database of GSDF that contains 3 robot embodiments for single-arm and bimanual manipulation, as well as more than 40 objects. Combining GSDF with physics engines, we demonstrate several immediate interesting applications: (1) learning zero-shot sim2real pixel-to- action manipulation policy with photo-realistic rendering, (2) automated high-quality DAgger data collection for adapting policies to deployment environments, (3) reproducible bench- marking of real-robot manipulation policies in simulation, (4) simulation data collection by virtual teleoperation, and (5) zero- shot sim2real visual reinforcement learning.

Index terms

Deep Learning in Grasping and Manipulation Grippers and Other End-Effectors Imitation Learning