Research Analyzer
← Back ICRA 2026

MonoTher-Depth: Enhancing Thermal Depth Estimation Via Confidence-Aware Distillation

Xingxing Zuo, Nikhil Ranganathan, Connor Lee, Georgia Gkioxari, Soon-Jo Chung

PDF

AI summary

Key figure (auto-extracted from paper)
Confidence-aware distillation from pretrained RGB models reduces thermal depth estimation error by 22.88% in zero-shot scenarios without labeled ground truth.
Thermal depth estimation Monocular depth estimation Knowledge distillation Cross-modal learning Confidence-aware training Robotics perception

Problem

Thermal cameras excel in low-light and obscured conditions, but thermal depth estimation lacks labeled datasets and suffers from low contrast and noise, while existing cross-modal methods require perfectly aligned RGB-thermal pairs or video sequences.

Approach

The authors introduce MonoTher-Depth, a semi-supervised framework that distills a pretrained RGB depth model into a thermal model using a confidence-aware loss that weights guidance based on cross-modal feature similarity and depth consistency.

Key results

  • Introduces MonoTher-Depth, a semi-supervised framework bridging RGB and thermal depth estimation.
  • Proposes confidence-aware distillation that adaptively weights RGB guidance using cross-modal feature consistency.
  • Reduces absolute relative error by 22.88% on zero-shot thermal depth estimation without ground-truth supervision.
  • Eliminates the need for strictly co-registered RGB-thermal image pairs or sequential video data during training.

Why it matters

Enables reliable depth perception for autonomous systems and robots operating in adverse visual conditions like fog, smoke, and darkness.

Abstract

Monocular depth estimation (MDE) from thermal images is a crucial technology for robotic systems operating in challenging conditions such as fog, smoke, and low light. The limited availability of labeled thermal data constrains the generalization capabilities of thermal MDE models compared to foundational RGB MDE models, which benefit from datasets of millions of images across diverse scenarios. To address this challenge, we introduce a novel pipeline that enhances thermal MDE through knowledge distillation from a versatile RGB MDE model. Our approach features a confidence-aware distillation method that utilizes the predicted confidence of the RGB MDE to selectively strengthen the thermal MDE model, capitalizing on the strengths of the RGB model while mitigating its weaknesses. Our method significantly improves the accuracy of the thermal MDE, independent of the availability of labeled depth super- vision, and greatly expands its applicability to new scenarios. In our experiments on new scenarios without labeled depth, the proposed confidence-aware distillation method reduces the absolute relative error of thermal MDE by 22.88% compared to the baseline without distillation. The code will be available at: https://github.com/ZuoJiaxing/monother depth.

Index terms

Range Sensing Deep Learning for Visual Perception

Related papers