Generalizable Thermal-Based Depth Estimation Via Pre-Trained Visual Foundation Model
Ruoyu Fan, Wang Zhao, Matthieu Lin, Qi Wang, Yong-Jin Liu, Wenping Wang
Abstract
Depth estimation is a crucial task in computer vision, applicable to various domains such as 3D reconstruction, robotics, and autonomous driving. In particular, thermal-based depth estimation has unique advantages, including night-time vision. However, the existing depth estimation method remains challenging in robust generalization due to limited data re- sources and spectral differences between thermal and RGB images. In this paper, we present a self-supervised approach to enhance thermal-based depth estimation by leveraging pre- trained visual models initially designed for RGB data. In detail, we design a novel two-stage training strategy, incorporating Low-rank Adapters and Convolutional Adapters, which not only significantly improves accuracy and robustness but also enables impressive zero-shot generalization capabilities. Our method outperforms existing thermal-based depth estimation models, opening new possibilities for cross-modal applications in computer vision and robotics research.