Curriculum Multi-Task Self-Supervision Improves Lightweight Architectures for Onboard Satellite Hyperspectral Image Segmentation
Hugo Carlesso, Josiane Mothe, Radu Ionescu
AI summary
Problem
Onboard satellite hyperspectral image processing faces strict computational and energy constraints, yet existing lightweight models lack effective self-supervised pretraining strategies to capture complex spatial-spectral structures efficiently.
Approach
The authors introduce CMTSSL, a framework that combines masked image modeling with decoupled spatial and spectral jigsaw puzzle solving, guided by a curriculum learning strategy that progressively trains on harder samples ranked by 3D gradient magnitudes.
Key results
- Consistent accuracy gains across four public hyperspectral benchmarks
- New state-of-the-art 93.5% average accuracy on the HYPSO dataset
- Performance boost for lightweight models without increasing model size or FLOPs
- Encoder-agnostic pretraining strategy optimized for edge deployment
Why it matters
Enables robust, efficient hyperspectral analysis on resource-constrained satellites, reducing downlink bandwidth requirements and advancing real-time Earth observation.
Abstract
Hyperspectral imaging (HSI) captures detailed spectral signatures across hundreds of contiguous bands per pixel, being indispensable for remote sensing applications such as land-cover classification, change detection, and environmen- tal monitoring. Due to the high dimensionality of HSI data and the slow rate of data transfer in satellite-based systems, compact and efficient models are required to support onboard processing and minimize the transmission of redundant or low-value data. To this end, we introduce a novel curriculum multi-task self-supervised learning (CMTSSL) framework de- signed for lightweight architectures for HSI analysis. CMTSSL integrates masked image modeling with decoupled spatial and spectral jigsaw puzzle solving, guided by a curriculum learning strategy that progressively increases data difficulty during self-supervision. This enables the encoder to jointly capture fine-grained spectral continuity, spatial structure, and global semantic features. Unlike prior dual-task SSL methods, CMTSSL simultaneously addresses spatial and spectral rea- soning within a unified and computationally efficient design, being particularly suitable for training lightweight models for onboard satellite deployment. We validate our approach on four public benchmark datasets, demonstrating consistent gains in downstream segmentation tasks, using architectures that are over 16, 000× lighter than some state-of-the-art models. These results highlight the potential of CMTSSL in generalizable representation learning with lightweight architectures for real- world HSI applications. Our code is publicly available at: https://github.com/hugocarlesso/CMTSSL.