Abstract [eng] |
Electronic discourse authorship identification becomes a significant nowadays problem due to an increasing number of anonymous cybercrimes. Successful and effective authorship revealing of an unknown text, written in electronic form, is often an especially important aspect for investigation process of internet cybercrimes. A lot of language independent general linguistic features and characteristics are already defined and intended for the authorship identification process of anonymous texts, written in any language (with exceptions). Also many methods and system prototypes, intended for the authorship identification of texts, written in English language, exist, which use these features and characteristics along with corresponding additions to English language. For the majority of other national languages, this area is not as well developed, but recently is being actively improved. In this work, various methods, tools and linguistic characteristics, which allow to determine the authorship‘s identity of an anonymous text, according to a present database of identified authors‘ texts, are examined. Their applicability to Lithuanian language and the effectiveness are explored, as well asadditional, language dependent features are suggested and tested for their effectiveness. Also the innovative method and system prototype for the authorship identification of anonymous texts, written in Lithuanian language, are suggested, where general and additional, Lithuanian language specific, features are used. For the experiments of the suggested system, a database with the texts of 200 different authors was used. The experiments were carried out using general linguistic features only, and general with Lithuanian language specific features together. The experiments proved that the additional, Lithuanian language specific, features significantly improve the accuracy of authorship identification rather than the general features used only. |