Title Lietuviškų naujienų grupavimo algoritmų tyrimas /
Translation of Title Clustering of Lithuanian News Articles.
Authors Pranckaitis, Vilius
Full Text Download
Pages 48
Keywords [eng] document clustering ; feature selection ; Lithuanian news articles ; k-means ; hierarchical clustering
Abstract [eng] This work studies document clustering application for clustering news articles from three major Lithuanian news sites. Different aspects of clustering are studied, including feature selection and comparison of k‑means and hierarchical clustering algorithms. This study proposes a metric for measuring how well particular words describe the contents of the cluster. In addition, a two level clustering method was proposed, combining hierarchical and k‑means algorithms. The results show that TF–IDF with stemming produce significantly better results than simple TF and/or no stemming. Also, k‑means produced better quality clustering than hierarchical methods and was less sensitive to feature space reduction. The proposed two level clustering showed promising results, however, clustering quality didn’t match the one produced by k‑means algorithm.
Dissertation Institution Kauno technologijos universitetas.
Type Master thesis
Language Lithuanian
Publication date 2017