← Back ICRA 2026

SaLF: Sparse Local Fields for Multi-Sensor Rendering in Real-Time

Yun Chen, Matthew Haines, Jingkang Wang, Sahil Jain, Krzysztof Baron-Lis, Siva Manivasagam, Ze Yang, Raquel Urtasun

PDF

AI summary

Key figure (auto-extracted from paper)

SaLF unifies rasterization and ray-tracing in a single sparse voxel representation, enabling fast, high-fidelity, real-time simulation for both cameras and LiDARs in autonomous driving.

Sparse Local Fields Neural Rendering Multi-Sensor Simulation Real-Time Rendering Autonomous Driving Volumetric Representation

Problem

Existing neural rendering methods for autonomous driving simulation are either slow (NeRF-based) or limited to pinhole cameras (3D Gaussian Splatting), lacking a unified, efficient representation that supports complex multi-sensor models like rolling-shutter LiDARs and fisheye cameras.

Approach

SaLF represents scenes as a sparse grid of 3D voxel primitives, where each voxel contains a local implicit field, enabling both efficient tile-based rasterization for cameras and accelerated octree-based ray-casting for LiDAR and complex sensors.

Key results

Real-time rendering at 50+ FPS for cameras and 600+ FPS for LiDAR
Unified support for complex sensors like rolling-shutter LiDARs and fisheye cameras
Training under 30 minutes with adaptive voxel pruning and densification
Comparable photorealism and smaller downstream domain gap than state-of-the-art simulators

Why it matters

Provides a unified, efficient foundation for scalable, high-fidelity multi-sensor simulation critical for testing and training autonomous driving systems.

Abstract

High-fidelity sensor simulation of light-based sen- sors such as cameras and LiDARs is critical for safe and accurate autonomy testing. Neural radiance field (NeRF)-based methods that reconstruct sensor observations via ray-casting of implicit representations have demonstrated accurate simulation of driving scenes, but are slow to train and render, hampering scalability. 3D Gaussian Splatting (3DGS) has demonstrated faster training and rendering times through rasterization, but is primarily restricted to pinhole camera sensors, preventing usage for realistic multi-sensor autonomy evaluation. Moreover, both NeRF and 3DGS couple the representation with the rendering procedure (implicit networks for ray-based evaluation, particles for rasterization), preventing interoperability, which is key for general usage. In this work, we present Sparse Local Fields (SaLF), a novel volumetric representation that supports rasterization and raytracing for unified multi-sensor simulation. SaLF represents volumes as a sparse set of 3D voxel primitives, where each voxel is a local implicit field. SaLF has fast training (<30 min) and rendering capabilities (50+ FPS for camera and 600+ FPS for LiDAR), has adaptive pruning and densification to easily handle large scenes, and can support non- pinhole cameras and spinning LiDARs. We demonstrate that SaLF delivers realism comparable to existing self-driving sensor simulation methods while providing an efficient and versatile foundation for more scalable simulation.

Index terms

Computer Vision for Automation Autonomous Vehicle Navigation Simulation and Animation