Thermal Image Refinement with Depth Estimation Using Recurrent Networks for Monocular ORB-SLAM3
and Erdal Kayacan
AI summary
Problem
Conventional RGB cameras fail in dark or visually degraded environments, while thermal cameras lack texture and consistency for reliable SLAM. Existing solutions often require expensive radiometric sensors or struggle with real-time deployment on UAVs.
Approach
The authors propose T-RefNet to enhance raw thermal inputs for feature extraction, paired with a recurrent depth estimator that enforces temporal consistency, all integrated into monocular ORB-SLAM3 for metric-scale tracking.
Key results
- Achieves ~0.06 absolute relative depth error on radiometric VIVID++ dark dataset, halving baseline errors
- Maintains <0.10 error on custom non-radiometric indoor data where baselines exceed 0.24
- Delivers sub-0.4 m trajectory error in real-world thermal-only UAV SLAM flights
- Introduces a publicly available non-radiometric thermal-depth UAV dataset for benchmarking
Why it matters
Provides a practical, low-cost pathway for reliable autonomous UAV navigation in GPS-denied, dark, or smoke-filled environments where standard vision fails.
Abstract
Autonomous navigation in GPS-denied and visu- ally degraded environments remains challenging for unmanned aerial vehicles (UAVs). To this end, we investigate the use of a monocular thermal camera as a standalone sensor on a UAV platform for real-time depth estimation and simultaneous localization and mapping (SLAM). To extract depth information from thermal images, we propose a novel pipeline employ- ing a lightweight supervised network with recurrent blocks (RBs) integrated to capture temporal dependencies, enabling more robust predictions. The network combines lightweight convolutional backbones with a thermal refinement network (T-RefNet) to refine raw thermal inputs and enhance feature visibility. The refined thermal images and predicted depth maps are integrated into ORB-SLAM3, enabling thermal- only localization. Unlike previous methods, the network is trained on a custom non-radiometric dataset, obviating the need for high-cost radiometric thermal cameras. Experimental results on datasets and UAV flights demonstrate competitive depth accuracy and robust SLAM performance under low- light conditions. On the radiometric VIVID++ (indoor–dark) dataset, our method achieves an absolute relative error of approximately 0.06, compared to baselines exceeding 0.11. In our non-radiometric indoor set, baseline errors remain above 0.24, whereas our approach remains below 0.10. Thermal-only ORB-SLAM3 maintains a mean trajectory error under 0.4 m.