| Title |
Part-of-speech tagging via deep neural networks for Northern-Ethiopic languages |
| Authors |
Gebremichael Tesfagergish, Senait ; Kapočiūtė-Dzikienė, Jurgita |
| DOI |
10.5755/j01.itc.49.4.26808 |
| Full Text |
|
| Is Part of |
Information technology and control = Informacinės technologijos ir valdymas.. Kaunas : Technologija. 2020, vol. 49, no. 4, p. 482-494.. ISSN 1392-124X. eISSN 2335-884X |
| Keywords [eng] |
Deep Learning ; word2vec embeddings ; part-of-speech tagging ; natural language processing ; computational linguistics ; Tigrinya language |
| Abstract [eng] |
Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus [19]. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to ~92% that is ~65% above the random baseline. |
| Published |
Kaunas : Technologija |
| Type |
Journal article |
| Language |
English |
| Publication date |
2020 |
| CC license |
|