Title |
Part-of-speech tagging via deep neural networks for Northern-Ethiopic languages / |
Authors |
Gebremichael Tesfagergish, Senait ; Kapočiūtė-Dzikienė, Jurgita |
DOI |
10.5755/j01.itc.49.4.26808 |
Full Text |
|
Is Part of |
Information technology and control = Informacinės technologijos ir valdymas.. Kaunas : Technologija. 2020, vol. 49, no. 4, p. 482-494.. ISSN 1392-124X. eISSN 2335-884X |
Keywords [eng] |
Deep Learning ; word2vec embeddings ; part-of-speech tagging ; natural language processing ; computational linguistics ; Tigrinya language |
Abstract [eng] |
Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus [19]. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to ~92% that is ~65% above the random baseline. |
Published |
Kaunas : Technologija |
Type |
Journal article |
Language |
English |
Publication date |
2020 |
CC license |
|