← Back ICRA 2026

NavGSim: High-Fidelity Gaussian Splatting Simulator for Large-Scale Navigation

Jiahang Liu, Yuanxing Duan, Jiazhao Zhang, Minghan Li, Shaoan Wang, Zhizheng Zhang, He Wang

PDF

AI summary

Key figure (auto-extracted from paper)

A Gaussian Splatting-based simulator enables photorealistic, large-scale navigation training, allowing a vision-language-action model to successfully generalize from simulation to real-world quadruped robot deployment.

Gaussian Splatting Robot Navigation Simulation Vision-Language-Action Embodied AI Collision Detection

Problem

Existing navigation simulators lack photorealistic rendering or require excessive manual effort to scale to large environments, hindering the training of robust embodied AI agents.

Approach

NavGSim leverages hierarchical 3D Gaussian Splatting for real-time, high-fidelity scene rendering and introduces a Gaussian slicing technique for efficient collision detection, all wrapped in a user-friendly Python API.

Key results

Enables photorealistic rendering and collision detection for scenes spanning hundreds of square meters
Provides a comprehensive Python API for custom scene reconstruction and policy training
Fine-tuned VLA model achieves up to 100% success rate on seen landmarks and strong generalization to unseen targets
Successfully transfers simulated navigation policies to a real-world Unitree Go2 quadruped robot

Why it matters

Provides the robotics community with a scalable, photorealistic simulation platform to train and evaluate embodied AI policies that reliably transfer to physical robots.

Abstract

Simulating realistic environments for robots is widely recognized as a critical challenge in robot learning, particularly in terms of rendering and physical simulation. This challenge becomes even more pronounced in navigation tasks, where trajectories often extend across multiple rooms or even entire floors. In this work, we present NavGSim, a Gaussian Splatting-based simulator designed to generate high-fidelity, large-scale navigation environments. Built upon a hierarchical 3D Gaussian Splatting framework, NavGSim enables photorealistic rendering in expansive scenes spanning hundreds of square meters. To simulate navigation collisions, we introduce a Gaussian Splatting-based slice technique that directly extracts navigable areas from reconstructed Gaus- sians. Additionally, for ease of use, we provide comprehensive NavGSim APIs supporting multi-GPU development, including tools for custom scene reconstruction, robot configuration, policy training, and evaluation. To evaluate NavGSim’s ef- fectiveness, we train a Vision-Language-Action (VLA) model using trajectories collected from the NavGSim and assess its performance in both simulated and real-world environments. Our results demonstrate that NavGSim significantly enhances the VLA model’s scene understanding, enabling the policy to handle diverse navigation queries effectively. NavGSim is publicly available at: https://github.com/2003jiahang/NavGSim

Index terms

Simulation and Animation Integrated Planning and Learning Learning from Demonstration