A multi-scale simplicial transformer with graph attention for facial emotion recognition

Samia Nawaz Yousafzai; Inzamam Mashood Nasir; Oumaima Saidani; Refka Ghodhbani; Yeonghyeon Gu; Muhammad Syafrudin; Norma Latif Fitriyani

doi:10.1016/j.asej.2025.103584

Title	A multi-scale simplicial transformer with graph attention for facial emotion recognition
Authors	Yousafzai, Samia Nawaz ; Nasir, Inzamam Mashood ; Saidani, Oumaima ; Ghodhbani, Refka ; Gu, Yeonghyeon ; Syafrudin, Muhammad ; Fitriyani, Norma Latif
DOI	10.1016/j.asej.2025.103584
Full Text
Is Part of	Ain Shams Engineering journal.. Amsterdam : Elsevier. 2025, vol. 16, iss. 10, art. no. 103584, p. 668-676.. ISSN 2090-4479. eISSN 2090-4495
Keywords [eng]	Explainable AI ; Face detection ; Facial expression recognition ; Graph attention network ; Hybrid adaptive attention ; Simplicial transformer
Abstract [eng]	Facial Emotion Recognition (FER) plays a vital role in human-computer interaction and affective computing, facing challenges like obstructed views and varying facial poses. Our approach employs a graph-based FER framework integrating multi-scale feature extraction with adaptive attention mechanisms for accurate emotion detection. Initially, YOLOv8 detects faces, enabling the creation of multi-scale graphs to analyze spatial relationships among features. A hybrid adaptive attention mechanism sharpens these features before processing them by a simplicial transformer network for dependency capture. Using a graph attention network enhances edge weighting, thereby improving recognition performance. The proposed model is evaluated on two benchmark datasets namely AffectNet and FER2013 achieving accuracy of 81.84% and 90.40%, respectively. On occlusion and pose AffectNet dataset, the model demonstrates notable accuracy improvements of 3.7% and 4.2%, respectively, over the strongest baseline. Futhermore, cross-dataset validation is conducted with highest performance of 98.54% accuracy by combining (AffectNet and FER2013) for training and testing on additional CK+ dataset. Across these datasets, statistical significance is confirmed through paired t-tests and Wilcoxon signed-rank tests, with p-values consistently below 0.05, validating the robustness of performance gains. Visualizations using Grad-CAM and t-SNE further validate the model's discriminative power and focus on expressive regions. These results demonstrate strong generalization and practical applicability of the proposed approach in real-world FER scenarios.
Published	Amsterdam : Elsevier
Type	Journal article
Language	English
Publication date	2025
CC license

„A multi-scale simplicial transformer with graph attention for facial emotion recognition“