AnyThermal: Towards Learning Universal Representations for Thermal Perception
Parv Maheshwari, Jay Karhade, Yogesh Chawla, Isaiah Adu, Florian Heisen, Andrew Porco, Andrew Jong, Yifei Liu, Santosh Pitla, Sebastian Scherer, Wenshan Wang
AI summary
Problem
Thermal perception lacks large-scale, diverse datasets and robust feature extractors, as existing methods rely on small-scale, single-environment data or fail to capture thermal-specific cues when adapted from RGB models.
Approach
The authors distill features from a frozen RGB foundation model (DINOv2) into a thermal encoder using multi-environment RGB-Thermal data, and introduce the open-source TartanRGBT platform and dataset to bridge the data diversity gap.
Key results
- Task-agnostic thermal encoder outperforms baselines in cross-modal place recognition, segmentation, and depth estimation.
- Introduces TartanRGBT, the first open-source hardware/software platform for synchronized stereo RGB-Thermal data collection.
- Releases TartanRGBT dataset spanning urban, indoor, aerial, and off-road environments with 16,943 synchronized pairs.
- Achieves up to 36% improvement over existing methods across diverse environments and tasks.
Why it matters
Enables robust, general-purpose thermal perception for robotics and autonomous systems in challenging real-world conditions where RGB sensors fail.
Abstract
We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks such as cross-modal place recognition, thermal segmentation, and monocular depth estimation from thermal images. Existing thermal backbones that follow task- specific training from small-scale data result in utility limited to a specific environment and task. Unlike prior methods, AnyThermal can be used for a wide range of environments (indoor, aerial, off-road, urban) and tasks, all without task- specific training. Our key insight is to distill the feature representations from visual foundation models such as DINOv2 into a thermal encoder using thermal data from these multiple environments. To bridge the diversity gap of the existing RGB- Thermal datasets, we introduce the TartanRGBT platform, the first open-source data collection platform with synced RGB- Thermal image acquisition. We use this payload to collect the TartanRGBT dataset - a diverse and balanced dataset collected ∗Equal contribution 1 Authors are with Robotics Institute, Carnegie Mellon University, Pitts- burgh, PA, USA. {parvm, jkarhade, ajong, yifeil5, basti, wenshanw}@andrew.cmu.edu 2 Authors are with Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA. {ychawla2, spitla}@nebraska.edu 3 Authors are with Mechanical Engineering, Penn State University, University Park, PA, USA. ioa5099@psu.edu 4 Authors are with the School of Engineering and Design, Technical Uni- versity of Munich, Munich, Germany. florian.heisen@tum.com 5 Authors are with Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA. aporco@andrew.cmu.edu in 4 environments. We demonstrate the efficacy of AnyTher- mal and TartanRGBT, achieving state-of-the-art results with improvements of up to 36% across diverse environments and downstream tasks on existing datasets.