Title A deep learning method for automatic SMS spam classification: performance of learning algorithms on indigenous dataset /
Authors Abayomi-Alli, Olusola Oluwakemi ; Misra, Sanjay ; Abayomi-Alli, Adebayo
DOI 10.1002/cpe.6989
Full Text Download
Is Part of Concurrency and computation: practice and experience.. Hoboken, NJ : John Wiley & Sons. 2022, vol. 34, iss. 17, art. no. e6989, p. 1-15.. ISSN 1532-0626. eISSN 1532-0634
Keywords [eng] algorithms ; classification ; deep learning ; machine learning ; short messages
Abstract [eng] SMS, one of the most popular and fast-growing GSM value-added services worldwide, has attracted unwanted SMS, also known as SMS spam. The effects of SMS spam are significant as it affects both the users and the service providers, causing a massive gap in trust among both parties. This article presents a deep learning model based on BiLSTM. Further, it compares our results with some of the states of the art machine learning (ML) algorithm on two datasets: our newly collected dataset and the popular UCI SMS dataset. This study aims to evaluate the performance of diverse learning models and compare the result of the new dataset expanded (ExAIS_SMS) using the following metrics the true positive (TP), false positive (FP), F-measure, recall, precision, and overall accuracy. The average accuracy for the BiLSTSM model achieved moderately improved results compared to some of the ML classifiers. The experimental results achieved significant improvement from the ground truth results after effective fine-tuning of some of the parameters. The BiLSTM model using the ExAIS_SMS dataset attained an accuracy of 93.4% and 98.6% for UCI datasets. Further comparison of the two datasets on the state-of-the-art ML classifiers gave an accuracy of Naive Bayes, BayesNet, SOM, decision tree, C4.5, J48 is 89.64%, 91.11%, 88.24%, 75.76%, 80.24%, and 79.2% respectively for ExAIS_SMS datasets. In conclusion, our proposed BiLSTM model showed significant improvement over traditional ML classifiers. To further validate the robustness of our model, we applied the UCI datasets, and our results showed optimal performance while classifying SMS spam messages based on some metrics: accuracy, precision, recall, and F-measure.
Published Hoboken, NJ : John Wiley & Sons
Type Journal article
Language English
Publication date 2022
CC license CC license description