← Back ICRA 2023

Multimodal Neural Radiance Field

Haidong Zhu, Yuyin Sun, Chi Liu, Lu Xia, Jiajia Luo, Nan Qiao, Ram Nevatia, CHENG-HAO KUO

PDF

Abstract

This paper addresses the challenge of reconstruct- ing a scene with a neural radiance field (NeRF) for robot vision and scene understanding using multiple modalities. Researchers have introduced the use of NeRF to represent an object for synthesizing and rendering novel views of complex scenes by optimizing a 3-D radiance field for ray casting and rendering for 2-D RGB images. However, using RGB images alone introduces additional geometry ambiguities with transparent objects or complex scenes and cannot accurately depict the 3-D shapes. We discuss and solve this problem and use multiple modalities as input for the same NeRF model to build a multimodal NeRF by incorporating point clouds and infrared image supervision to prevent such bias. In contrast to RGB images, infrared images and point clouds are typically taken by separate cameras that cannot be aligned with the RGB camera. We further introduce the alignment of different modalities based on point cloud registration to estimate the relative transformation matrices between them before training a NeRF model with multiple modalities. We evaluate our model on chosen scenes from the ScanNet and M2DGR datasets and demonstrate that it outperforms existing state-of-the-art methods.

Index terms

Cognitive Modeling Visual Learning RGB-D Perception