Transformer-based foundation learning for robust and data-efficient skin disease imaging

Inzamam Mashood Nasir; Hend Alshaya; Sara Tehsin; Wided Bouchelligua

doi:10.3390/diagnostics16030440

Title	Transformer-based foundation learning for robust and data-efficient skin disease imaging
Authors	Nasir, Inzamam Mashood ; Alshaya, Hend ; Tehsin, Sara ; Bouchelligua, Wided
DOI	10.3390/diagnostics16030440
Full Text
Is Part of	Diagnostics.. Basel : MDPI. 2026, vol. 16, iss. 3, art. no., p. 1-31.. ISSN 2075-4418
Keywords [eng]	cross-dataset generalization ; dermoscopic lesion imaging ; dermoscopy ; foundation model ; medical image analysis ; self-supervised learning ; vision transformer
Abstract [eng]	Background/Objectives: Accurate and reliable automated dermoscopic lesion classification remains challenging. This is due to pronounced dataset bias, limited expert-annotated data, and poor cross-dataset generalization of conventional supervised deep learning models. In clinical dermatology, these limitations restrict the deployment of data-driven diagnostic systems across diverse acquisition settings and patient populations. Methods: Motivated by these challenges, this study proposes a transformer-based, dermatology-specific foundation model. The model learns transferable visual representations from large collections of unlabeled dermoscopic images via self-supervised pretraining. It integrates large-scale dermatology-oriented self-supervised learning with a hierarchical vision transformer backbone. This enables effective capture of both fine-grained lesion textures and global morphological patterns. The evaluation is conducted across three publicly available dermoscopic datasets: ISIC 2018, HAM10000, and PH2. The study assesses in-dataset, cross-dataset, limited-label, ablation, and computational-efficiency settings. Results: The proposed approach achieves in-dataset classification accuracies of 94.87%, 97.32%, and 98.17% on ISIC 2018, HAM10000, and PH2, respectively. It outperforms strong transformer and hybrid baselines. Cross-dataset transfer experiments show consistent performance gains of 3.5-5.8% over supervised counterparts. This indicates improved robustness to domain shift. Furthermore, when fine-tuned with only 10% of the labeled training data, the model achieves performance comparable to fully supervised baselines. Conclusions: This highlights strong data efficiency. These results demonstrate that dermatology-specific foundation learning offers a principled and practical solution for robust dermoscopic lesion classification under realistic clinical constraints.
Published	Basel : MDPI
Type	Journal article
Language	English
Publication date	2026
CC license

„Transformer-based foundation learning for robust and data-efficient skin disease imaging“