Title Part-of-speech tagging via deep neural networks for Northern-Ethiopic languages /
Authors Gebremichael Tesfagergish, Senait ; Kapočiūtė-Dzikienė, Jurgita
DOI 10.5755/j01.itc.49.4.26808
Full Text Download
Is Part of Information technology and control = Informacinės technologijos ir valdymas.. Kaunas : Technologija. 2020, vol. 49, no. 4, p. 482-494.. ISSN 1392-124X. eISSN 2335-884X
Keywords [eng] Deep Learning ; word2vec embeddings ; part-of-speech tagging ; natural language processing ; computational linguistics ; Tigrinya language
Abstract [eng] Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus [19]. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to ~92% that is ~65% above the random baseline.
Published Kaunas : Technologija
Type Journal article
Language English
Publication date 2020
CC license CC license description