| Abstract [eng] |
This work explores the challenge of real-time facial attribute recognition (emotion, age, gender) for Human-Robot Interaction (HRI), focusing on deployment in resource-limited embedded systems. A full pipeline was developed and tested, including comparisons between deep learning architectures (CNNs, transfer learning models, and a Hybrid CNN-Transformer), optimization with TensorRT, and real-time evaluation on a Jetson Xavier NX using both public datasets and a live experiment with 36 participants. The Hybrid CNN-Transformer offered a good trade-off between performance and efficiency. With TensorRT FP16 optimization, the system reached real-time inference (64 FPS) for all three tasks, achieving up to 26× speedups without sacrificing accuracy. However, the real-time experiment revealed challenges, especially in emotion recognition, where performance dropped due to class imbalance, subtle expressions, and pose variation. Gender and age tasks also showed some biases depending on head pose and demographics. Overall, the thesis shows that real-time multi-task facial analysis on edge devices is feasible, but further work is needed on improving robustness, handling dataset imbalance, and mitigating bias – key steps to enable more perceptive and reliable social robots. |