← Back ICRA 2024

Dusk Till Dawn: Self-Supervised Nighttime Stereo Depth Estimation Using Visual Foundation Models

Madhu Vankadari, Samuel Hodgson, Sangyun Shin, kaichen zhou, Andrew Markham, Niki Trigoni

PDF

Abstract

Self-supervised depth estimation algorithms rely heavily on frame-warping relationships, exhibiting substantial performance degradation when applied in challenging circum- stances, such as low-visibility and nighttime scenarios with varying illumination conditions. Addressing this challenge, we introduce an algorithm designed to achieve accurate self- supervised stereo depth estimation focusing on nighttime condi- tions. Specifically, we use pretrained visual foundation models to extract generalised features across challenging scenes and present an efficient method for matching and integrating these features from stereo frames. Moreover, to prevent pixels violat- ing photometric consistency assumption from negatively affect- ing the depth predictions, we propose a novel masking approach designed to filter out such pixels. Lastly, addressing weaknesses in the evaluation of current depth estimation algorithms, we present novel evaluation metrics. Our experiments, conducted on challenging datasets including Oxford RobotCar and Multi- Spectral Stereo, demonstrate the robust improvements realized by our approach.

Index terms

Mapping SLAM Deep Learning for Visual Perception