SaLF: Sparse Local Fields for Multi-Sensor Rendering in Real-Time
Yun Chen, Matthew Haines, Jingkang Wang, Sahil Jain, Krzysztof Baron-Lis, Siva Manivasagam, Ze Yang, Raquel Urtasun
AI summary
Problem
Existing neural rendering methods for autonomous driving simulation are either slow (NeRF-based) or limited to pinhole cameras (3D Gaussian Splatting), lacking a unified, efficient representation that supports complex multi-sensor models like rolling-shutter LiDARs and fisheye cameras.
Approach
SaLF represents scenes as a sparse grid of 3D voxel primitives, where each voxel contains a local implicit field, enabling both efficient tile-based rasterization for cameras and accelerated octree-based ray-casting for LiDAR and complex sensors.
Key results
- Real-time rendering at 50+ FPS for cameras and 600+ FPS for LiDAR
- Unified support for complex sensors like rolling-shutter LiDARs and fisheye cameras
- Training under 30 minutes with adaptive voxel pruning and densification
- Comparable photorealism and smaller downstream domain gap than state-of-the-art simulators
Why it matters
Provides a unified, efficient foundation for scalable, high-fidelity multi-sensor simulation critical for testing and training autonomous driving systems.
Abstract
High-fidelity sensor simulation of light-based sen- sors such as cameras and LiDARs is critical for safe and accurate autonomy testing. Neural radiance field (NeRF)-based methods that reconstruct sensor observations via ray-casting of implicit representations have demonstrated accurate simulation of driving scenes, but are slow to train and render, hampering scalability. 3D Gaussian Splatting (3DGS) has demonstrated faster training and rendering times through rasterization, but is primarily restricted to pinhole camera sensors, preventing usage for realistic multi-sensor autonomy evaluation. Moreover, both NeRF and 3DGS couple the representation with the rendering procedure (implicit networks for ray-based evaluation, particles for rasterization), preventing interoperability, which is key for general usage. In this work, we present Sparse Local Fields (SaLF), a novel volumetric representation that supports rasterization and raytracing for unified multi-sensor simulation. SaLF represents volumes as a sparse set of 3D voxel primitives, where each voxel is a local implicit field. SaLF has fast training (<30 min) and rendering capabilities (50+ FPS for camera and 600+ FPS for LiDAR), has adaptive pruning and densification to easily handle large scenes, and can support non- pinhole cameras and spinning LiDARs. We demonstrate that SaLF delivers realism comparable to existing self-driving sensor simulation methods while providing an efficient and versatile foundation for more scalable simulation.