SiLVR: Scalable Lidar-Visual Radiance Field Reconstruction with Uncertainty Quantification
Yifu Tao, Maurice Fallon
AI summary
Problem
Neural radiance fields (NeRFs) struggle with geometric accuracy in textureless or sparsely observed areas, lack built-in uncertainty measures, and face severe memory bottlenecks when scaling to large environments.
Approach
The system integrates lidar depth and surface normal constraints into a NeRF framework, partitions scenes using visibility-based clustering, and estimates epistemic uncertainty via spatial variance of a perturbation field to identify unreliable regions.
Key results
- Uncertainty-aware lidar-visual NeRF reconstruction pipeline
- Epistemic uncertainty quantification using perturbation field spatial variance
- Visibility-based submapping strategy to minimize boundary artifacts
- Validated on 20,000+ m² datasets with millimeter-accurate ground truth
Why it matters
Provides robotics and mapping communities with a reliable, scalable tool for generating photorealistic 3D maps with quantifiable confidence, critical for autonomous navigation and inspection.
Abstract
We present a neural radiance field (NeRF)-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photorealistic texture. Our system adopts the state-of-the-art NeRF representation to additionally incorporate lidar. Adding lidar data adds strong geometric constraints on the depth and surface normals, which is particularly useful when modeling uniform texture surfaces which contain ambiguous visual reconstruction cues. A key contribution of this work is a novel method to quantify the epistemic uncertainty of the lidar-visual NeRF reconstruction by estimating the spatial variance of each point location in the radiance field given the sensor observations from the cameras and lidar. This provides a principled approach to evaluate the contribution of each sensor modality to the final reconstruction. In this way, reconstructions that are uncertain (due to, e.g., uniform visual texture, limited observation viewpoints, or little lidar coverage) can be identified and removed. Our system is integrated with a real-time pose-graph lidar simultaneous local- isation and mapping (SLAM) system, which is used to bootstrap a structure-from-motion reconstruction procedure. It also helps to properly constrain the overall metric scale, which is essential for the lidar depth loss. The refined SLAM trajectory can then be divided into submaps using spectral clustering to group sets of covisible im- agestogether.Thissubmappingapproachismoresuitableforvisual reconstruction than distance-based partitioning. Our uncertainty estimation is particularly effective when merging submaps, as their boundaries often contain artifacts due to limited observations. We demonstrate the reconstruction system using a multicamera, lidar sensor suite in experiments involving both robot-mounted and handheld scanning. Our test datasets cover a total area of more than 20 000 m2, including multiple university buildings and an aerial survey of a multistorey building. Quantitative evaluation is provided by comparing with maps produced by a commercial tripod scanner.