Twenty years of machine-learning-based text classification: a systematic review

Ashokkumar Palanivinayagam; Claude Ziad El-Bayeh; Robertas Damaševičius

doi:10.3390/a16050236

Title	Twenty years of machine-learning-based text classification: a systematic review
Authors	Palanivinayagam, Ashokkumar ; El-Bayeh, Claude Ziad ; Damaševičius, Robertas
DOI	10.3390/a16050236
Full Text
Is Part of	Algorithms.. Basel : MDPI. 2023, vol. 16, iss. 5, art. no. 236, p. 1-28.. ISSN 1999-4893
Keywords [eng]	machine learning ; natural language processing ; rating summarization ; sentiment analysis ; spam detection ; text classification
Abstract [eng]	Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classification. The Preferred Reporting Items for Systematic Reviews (PRISMA) statement is used as the guidelines for the systematic review process. The comprehensive differences in the literature are analyzed in terms of six aspects: datasets, machine learning models, best accuracy, performance evaluation metrics, training and testing splitting methods, and comparisons among machine learning models. Furthermore, we highlight the limitations and research gaps in the literature. Although the research works included in the survey perform well in terms of text classification, improvement is required in many areas. We believe that this survey paper will be useful for researchers in the field of text classification.
Published	Basel : MDPI
Type	Journal article
Language	English
Publication date	2023
CC license

„Twenty years of machine-learning-based text classification: a systematic review“