NavGSim: High-Fidelity Gaussian Splatting Simulator for Large-Scale Navigation
Jiahang Liu, Yuanxing Duan, Jiazhao Zhang, Minghan Li, Shaoan Wang, Zhizheng Zhang, He Wang
AI summary
Problem
Existing navigation simulators lack photorealistic rendering or require excessive manual effort to scale to large environments, hindering the training of robust embodied AI agents.
Approach
NavGSim leverages hierarchical 3D Gaussian Splatting for real-time, high-fidelity scene rendering and introduces a Gaussian slicing technique for efficient collision detection, all wrapped in a user-friendly Python API.
Key results
- Enables photorealistic rendering and collision detection for scenes spanning hundreds of square meters
- Provides a comprehensive Python API for custom scene reconstruction and policy training
- Fine-tuned VLA model achieves up to 100% success rate on seen landmarks and strong generalization to unseen targets
- Successfully transfers simulated navigation policies to a real-world Unitree Go2 quadruped robot
Why it matters
Provides the robotics community with a scalable, photorealistic simulation platform to train and evaluate embodied AI policies that reliably transfer to physical robots.
Abstract
Simulating realistic environments for robots is widely recognized as a critical challenge in robot learning, particularly in terms of rendering and physical simulation. This challenge becomes even more pronounced in navigation tasks, where trajectories often extend across multiple rooms or even entire floors. In this work, we present NavGSim, a Gaussian Splatting-based simulator designed to generate high-fidelity, large-scale navigation environments. Built upon a hierarchical 3D Gaussian Splatting framework, NavGSim enables photorealistic rendering in expansive scenes spanning hundreds of square meters. To simulate navigation collisions, we introduce a Gaussian Splatting-based slice technique that directly extracts navigable areas from reconstructed Gaus- sians. Additionally, for ease of use, we provide comprehensive NavGSim APIs supporting multi-GPU development, including tools for custom scene reconstruction, robot configuration, policy training, and evaluation. To evaluate NavGSim’s ef- fectiveness, we train a Vision-Language-Action (VLA) model using trajectories collected from the NavGSim and assess its performance in both simulated and real-world environments. Our results demonstrate that NavGSim significantly enhances the VLA model’s scene understanding, enabling the policy to handle diverse navigation queries effectively. NavGSim is publicly available at: https://github.com/2003jiahang/NavGSim