Title Hybrid state-space and vision transformer framework for fetal ultrasound plane clasification in prenatal diagnostics
Authors Tehsin, Sara ; Alshaya, Hend ; Bouchelligua, Wided ; Nasir, Inzamam Mashood
DOI 10.3390/diagnostics15222879
Full Text Download
Is Part of Diagnostics.. Basel : MDPI. 2025, vol. 15, iss. 22, art. no. 2879, p. 1-32.. ISSN 2075-4418
Keywords [eng] fetal ultrasound ; multi-task learning ; prenatal diagnostics ; state–space models ; vision transformers
Abstract [eng] Background and Objective: Accurate classification of standard fetal ultrasound planes is a critical step in prenatal diagnostics, enabling reliable biometric measurements and anomaly detection. Conventional deep learning approaches, particularly convolutional neural networks (CNNs) and transformers, often face challenges such as domain variability, noise artifacts, class imbalance, and poor calibration, which limit their clinical utility. This study proposes a hybrid state-space and vision transformer framework designed to address these limitations by integrating sequential dynamics and global contextual reasoning. Methods: The proposed framework comprises five stages: (i) preprocessing for ultrasound harmonization using intensity normalization, anisotropic diffusion filtering, and affine alignment; (ii) hybrid feature encoding with a state-space model (SSM) for sequential dependency modeling and a vision transformer (ViT) for global self-attention; (iii) multi-task learning (MTL) with anatomical regularization leveraging classification, segmentation, and biometric regression objectives; (iv) gated decision fusion for balancing local sequential and global contextual features; and (v) calibration strategies using temperature scaling and entropy regularization to ensure reliable confidence estimation. The framework was comprehensively evaluated on three publicly available datasets: FETAL_PLANES_DB, HC18, and a large-scale fetal head dataset. Results: The hybrid framework consistently outperformed baseline CNN, SSM-only, and ViT-only models across all tasks. On FETAL_PLANES_DB, it achieved an accuracy of 95.8%, a macro-F1 of 94.9%, and an ECE of 1.5%. On the Fetal Head dataset, the model achieved 94.1% accuracy and a macro-F1 score of 92.8%, along with superior calibration metrics. For HC18, it achieved a Dice score of 95.7%, an IoU of 91.7%, and a mean absolute error of 2.30 mm for head circumference estimation. Cross-dataset evaluations confirmed the model's robustness and generalization capability. Ablation studies further demonstrated the critical role of SSM, ViT, fusion gating, and anatomical regularization in achieving optimal performance. Conclusions: By combining state-space dynamics and transformer-based global reasoning, the proposed framework delivers accurate, calibrated, and clinically meaningful predictions for fetal ultrasound plane classification and biometric estimation. The results highlight its potential for deployment in real-time prenatal screening and diagnostic systems.
Published Basel : MDPI
Type Journal article
Language English
Publication date 2025
CC license CC license description