MonoTher-Depth: Enhancing Thermal Depth Estimation Via Confidence-Aware Distillation
Xingxing Zuo, Nikhil Ranganathan, Connor Lee, Georgia Gkioxari, Soon-Jo Chung
AI summary
Problem
Thermal cameras excel in low-light and obscured conditions, but thermal depth estimation lacks labeled datasets and suffers from low contrast and noise, while existing cross-modal methods require perfectly aligned RGB-thermal pairs or video sequences.
Approach
The authors introduce MonoTher-Depth, a semi-supervised framework that distills a pretrained RGB depth model into a thermal model using a confidence-aware loss that weights guidance based on cross-modal feature similarity and depth consistency.
Key results
- Introduces MonoTher-Depth, a semi-supervised framework bridging RGB and thermal depth estimation.
- Proposes confidence-aware distillation that adaptively weights RGB guidance using cross-modal feature consistency.
- Reduces absolute relative error by 22.88% on zero-shot thermal depth estimation without ground-truth supervision.
- Eliminates the need for strictly co-registered RGB-thermal image pairs or sequential video data during training.
Why it matters
Enables reliable depth perception for autonomous systems and robots operating in adverse visual conditions like fog, smoke, and darkness.
Abstract
Monocular depth estimation (MDE) from thermal images is a crucial technology for robotic systems operating in challenging conditions such as fog, smoke, and low light. The limited availability of labeled thermal data constrains the generalization capabilities of thermal MDE models compared to foundational RGB MDE models, which benefit from datasets of millions of images across diverse scenarios. To address this challenge, we introduce a novel pipeline that enhances thermal MDE through knowledge distillation from a versatile RGB MDE model. Our approach features a confidence-aware distillation method that utilizes the predicted confidence of the RGB MDE to selectively strengthen the thermal MDE model, capitalizing on the strengths of the RGB model while mitigating its weaknesses. Our method significantly improves the accuracy of the thermal MDE, independent of the availability of labeled depth super- vision, and greatly expands its applicability to new scenarios. In our experiments on new scenarios without labeled depth, the proposed confidence-aware distillation method reduces the absolute relative error of thermal MDE by 22.88% compared to the baseline without distillation. The code will be available at: https://github.com/ZuoJiaxing/monother depth.