Title |
The Effect of author set size in authorship attribution for Lithuanian / |
Authors |
Kapočiūtė-Dzikienė, Jurgita ; Šarkutė, Ligita ; Utka, Andrius |
ISBN |
9789175190983 |
Full Text |
|
Is Part of |
NODALIDA 2015 : proceedings of the 20th Nordic conference of computational linguistics, May 11–13, 2015, Institute of the Lithuanian language, Vilnius / editor Beata Megyesi.. Linköping : Linköping University Electronic Press, 2015. p. 87-96.. ISBN 9789175190983 |
Keywords [eng] |
Authorship attribution ; Parliamentary transcripts |
Abstract [eng] |
This paper reports the first authorship attribution results based on the effect of the author set size using automatic computational methods for the Lithuanian language. The aim is to determine how fast authorship attribution results are deteriorating while the number of candidate authors is gradually increasing: i.e. starting from 3, going up to 5, 10, 20, 50, and 100. Using supervised machine learning techniques we also investigated the influence of different features (lexical, character, morphological, etc.) and language types (normative parliamentary speeches and non-normative forum posts). The experiments revealed that the effectiveness of the method and feature types depends more on the language type rather than on the number of candidate authors. The content features based on word lemmas are the most useful type for the normative texts, due to the fact that Lithuanian is a highly inflective, morphologically and vocabulary rich language. The character features are the most accurate type for forum posts, where texts are too complicated to be effectively processed with external morphological tools. |
Published |
Linköping : Linköping University Electronic Press, 2015 |
Type |
Conference paper |
Language |
English |
Publication date |
2015 |