Abstract [eng] |
As the amount of information published on the Internet is constantly growing, meaningful search in unstructured texts is becoming more important. Natural language processing systems play an important role in the implementation of a semantic, as opposed to mere keyword-based, search. In 2015, the Information System for Syntactic and Semantic Analysis of the Lithuanian Language (Lith. Lietuvių kalbos sintaksinės ir semantinės analizės informacinė sistema, LKSSAIS) was developed during the course of the project Semantika-LT implemented by Vytautas Magnus University and Kaunas University of Technology; this system provides users with NLP and semantic text search services. However, this information allows realising limited recognition of information. Better results could be expected if semantic text analysis included information on anaphoras. The first attempt to develop anaphora resolution for the Lithuanian language during Semantika-LT revealed that development of such resolution must be gradual by constantly assessing and analysing quality. In order to carry this out efficiently, it is essential to have an automated assessment and analysis tool. Since the Lithuanian language does not include anaphora resolution assessment tools, tools used in other languages as well as methods and their application opportunities for the Lithuanian language were analysed in detail. It was established that the same anaphora resolution assessment and analysis methods may be used; however, due to the different classifications of anaphoras in different languages, the Lithuanian language requires a tool adapted to it. A decision was made to develop a separate independent tool because adaptation of an ill-adapted tool would possibly take more time than the development of a new independent tool. Having developed a requirements specification and designed the implementation project, the prototype tool ASAS was developed. The experiment revealed that a developed tool generated opportunities to analyse and assess anaphora resolution in the Lithuanian language. During the experiment, the ASAS tool was used to create a Lithuanian gold corpus and evaluate the factually developed anaphora resolution in the Lithuanian language. Currently, Lithuanian automated anaphora resolutions are being developed at Kaunas University of Technology Department of Information Systems (follow-up of the project Semantika-LT). The prototype ASAS tool developed throughout the course of this thesis could be used to carry out their assessment and analysis. |