Abstract [eng] |
Since the early days of modern computer science, automatic speech recognition has been one of the biggest and hardest challenges in the field. A large majority of research that was conducted in this area focused on the most widely spoken languages, such as English, French, Mandarin Chinese, etc. Due to complicated language structure and scarcity of data, the problems of automatic Lithuanian language speech recognition attracted very little attention. Until now Deep Learning methods were rarely used to solve this problem, while phoneme-based automatic speech recognition has not been investigated at all. The purpose of this project was to create an automatic speech recognition system for the Lithuanian language, that would be based on Deep Learning methods and could identify spoken words purely from their phoneme sequences. A survey of previous research work in the automatic Lithuanian language speech recognition field as well as structure and behavior of Recurrent Neural Networks were examined and are presented in this thesis. Due to their ability to work with variable length input and output sequence pairs, two encoder-decoder models were selected to solve the automatic speech recognition task: traditional encoder-decoder model and model with attention mechanism. The performance of these models was evaluated in isolated speech recognition and continuous speech recognition tasks. Additional experiments were conducted to determine, how the methods proposed by other researchers can increase accuracy. Finally, cross-validation technique was used to assess predictive performance of these models. |