GaussianFormer3D: Multi-Modal Gaussian-Based Semantic Occupancy Prediction with 3D Deformable Attention
Lingjun Zhao, Sizhe Wei, James Hays, Lu Gan
AI summary
Problem
Current multi-modal occupancy prediction relies on dense 3D voxels, which are computationally expensive and suffer from redundant empty grids, while existing Gaussian-based methods rely solely on 2D camera data, limiting accurate 3D geometric modeling and depth resolution.
Approach
The authors propose GaussianFormer3D, which initializes 3D Gaussians with LiDAR-derived geometry priors and refines them using a LiDAR-guided 3D deformable attention mechanism that aggregates fused LiDAR-camera features in a unified 3D space.
Key results
- State-of-the-art performance on nuScenes-SurroundOcc and Occ3D datasets
- Substantial accuracy gains on small objects and large surfaces
- Reduced memory consumption and improved inference efficiency
- Strong generalization to off-road environments with single-frame input
Why it matters
Enables more accurate, efficient, and robust 3D scene understanding for autonomous driving and robotic navigation by leveraging complementary LiDAR-camera data through a compact Gaussian representation.
Abstract
3D semantic occupancy prediction is essential for achieving safe, reliable autonomous driving and robotic nav- igation. Compared to camera-only perception systems, multi- modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and fine-grained predictions. Although voxel-based scene representations are widely used for semantic occupancy prediction, 3D Gaussians have emerged as a contin- uous and significantly more compact alternative. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, namely GaussianFormer3D. We introduce a voxel-to-Gaussian initial- ization strategy that provides 3D Gaussians with accurate geom- etry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism to refine these Gaussians using LiDAR-camera fusion features in a lifted 3D space. Extensive experiments on real-world on-road and off-road autonomous driving datasets demonstrate that GaussianFormer3D achieves state-of-the-art prediction performance with reduced mem- ory consumption and improved efficiency. Project website: https://lunarlab-gatech.github.io/GaussianFormer3D/.