← Back ICRA 2026

SiLVR: Scalable Lidar-Visual Radiance Field Reconstruction with Uncertainty Quantification

Yifu Tao, Maurice Fallon

PDF

AI summary

Key figure (auto-extracted from paper)

SiLVR fuses lidar and camera data into a scalable neural radiance field pipeline that quantifies epistemic uncertainty to filter artifacts and enable reliable large-scale 3D reconstruction.

NeRF lidar-visual fusion uncertainty quantification large-scale mapping submapping 3D reconstruction

Problem

Neural radiance fields (NeRFs) struggle with geometric accuracy in textureless or sparsely observed areas, lack built-in uncertainty measures, and face severe memory bottlenecks when scaling to large environments.

Approach

The system integrates lidar depth and surface normal constraints into a NeRF framework, partitions scenes using visibility-based clustering, and estimates epistemic uncertainty via spatial variance of a perturbation field to identify unreliable regions.

Key results

Uncertainty-aware lidar-visual NeRF reconstruction pipeline
Epistemic uncertainty quantification using perturbation field spatial variance
Visibility-based submapping strategy to minimize boundary artifacts
Validated on 20,000+ m² datasets with millimeter-accurate ground truth

Why it matters

Provides robotics and mapping communities with a reliable, scalable tool for generating photorealistic 3D maps with quantifiable confidence, critical for autonomous navigation and inspection.

Abstract

We present a neural radiance field (NeRF)-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photorealistic texture. Our system adopts the state-of-the-art NeRF representation to additionally incorporate lidar. Adding lidar data adds strong geometric constraints on the depth and surface normals, which is particularly useful when modeling uniform texture surfaces which contain ambiguous visual reconstruction cues. A key contribution of this work is a novel method to quantify the epistemic uncertainty of the lidar-visual NeRF reconstruction by estimating the spatial variance of each point location in the radiance field given the sensor observations from the cameras and lidar. This provides a principled approach to evaluate the contribution of each sensor modality to the final reconstruction. In this way, reconstructions that are uncertain (due to, e.g., uniform visual texture, limited observation viewpoints, or little lidar coverage) can be identified and removed. Our system is integrated with a real-time pose-graph lidar simultaneous local- isation and mapping (SLAM) system, which is used to bootstrap a structure-from-motion reconstruction procedure. It also helps to properly constrain the overall metric scale, which is essential for the lidar depth loss. The refined SLAM trajectory can then be divided into submaps using spectral clustering to group sets of covisible im- agestogether.Thissubmappingapproachismoresuitableforvisual reconstruction than distance-based partitioning. Our uncertainty estimation is particularly effective when merging submaps, as their boundaries often contain artifacts due to limited observations. We demonstrate the reconstruction system using a multicamera, lidar sensor suite in experiments involving both robot-mounted and handheld scanning. Our test datasets cover a total area of more than 20 000 m2, including multiple university buildings and an aerial survey of a multistorey building. Quantitative evaluation is provided by comparing with maps produced by a commercial tripod scanner.

Index terms

Mapping SLAM Sensor Fusion Neural Radiance Field