Abstract [eng] |
In today’s context, translating a telephone language into a text is a relevant task. The use can be adapted to the study of natural language or the extraction of context from sound recordings, as language recordings must be conveyed into text in order to carry out the study of information. Commercial solutions are usually not tailored to telephone language. Also, most major language translation service providers are not interested in developing solutions for languages with a relatively small number of speakers, such as Lithuanian. The above-mentioned reasons and the lack of annotated data on Lithuanian telephone calls lead to a small amount of research made on these tasks. However, the topic of language-to-text translation is gaining more and more attention due to desire to automate the delivery of services, simplify the device input, and the latest advances of deep learning algorithms. In this work, the existing model architectures of language to text translation are reviewed. The best-performing text-to-text translation models for training with telephone call data are being investigated. Criteria such as accuracy, speed, training speed are taken into account. It is also proposed to use a methodology for adding grammatical error correction algorithm with a frequency dictionary search to improve the results. After the research, the accuracy of the commercial method and two models that were trained with Lithuanian calls data set were determined. The performance speed of the inference was compared among two “NVIDIA” graphics cards. The difference in accuracy between original methods and methods with added proposed methodology to additionally use a word frequency vocabulary search was calculated. |