| Title |
A multi-scale simplicial transformer with graph attention for facial emotion recognition |
| Authors |
Yousafzai, Samia Nawaz ; Nasir, Inzamam Mashood ; Saidani, Oumaima ; Ghodhbani, Refka ; Gu, Yeonghyeon ; Syafrudin, Muhammad ; Fitriyani, Norma Latif |
| DOI |
10.1016/j.asej.2025.103584 |
| Full Text |
|
| Is Part of |
Ain Shams Engineering journal.. Amsterdam : Elsevier. 2025, vol. 16, iss. 10, art. no. 103584, p. 668-676.. ISSN 2090-4479. eISSN 2090-4495 |
| Keywords [eng] |
Explainable AI ; Face detection ; Facial expression recognition ; Graph attention network ; Hybrid adaptive attention ; Simplicial transformer |
| Abstract [eng] |
Facial Emotion Recognition (FER) plays a vital role in human-computer interaction and affective computing, facing challenges like obstructed views and varying facial poses. Our approach employs a graph-based FER framework integrating multi-scale feature extraction with adaptive attention mechanisms for accurate emotion detection. Initially, YOLOv8 detects faces, enabling the creation of multi-scale graphs to analyze spatial relationships among features. A hybrid adaptive attention mechanism sharpens these features before processing them by a simplicial transformer network for dependency capture. Using a graph attention network enhances edge weighting, thereby improving recognition performance. The proposed model is evaluated on two benchmark datasets namely AffectNet and FER2013 achieving accuracy of 81.84% and 90.40%, respectively. On occlusion and pose AffectNet dataset, the model demonstrates notable accuracy improvements of 3.7% and 4.2%, respectively, over the strongest baseline. Futhermore, cross-dataset validation is conducted with highest performance of 98.54% accuracy by combining (AffectNet and FER2013) for training and testing on additional CK+ dataset. Across these datasets, statistical significance is confirmed through paired t-tests and Wilcoxon signed-rank tests, with p-values consistently below 0.05, validating the robustness of performance gains. Visualizations using Grad-CAM and t-SNE further validate the model's discriminative power and focus on expressive regions. These results demonstrate strong generalization and practical applicability of the proposed approach in real-world FER scenarios. |
| Published |
Amsterdam : Elsevier |
| Type |
Journal article |
| Language |
English |
| Publication date |
2025 |
| CC license |
|