Title Klasterizavimo metodų, taikomų binariniams duomenims, tyrimas /
Translation of Title Analysis of clustering methods for binary data.
Authors Tamašauskas, Darius
Full Text Download
Pages 94
Keywords [eng] cluster analysis ; binary data ; distance matrix
Abstract [eng] Clustering analysis is often applied to data, which can be measured in ratio scale. But in this work clustering methods are applied to binary data, and the research is made to compare hierarchical and partitive clustering methods and their efficiency. Monte-Carlo simulation method is used for getting binary data. Data is being created using binomial distribution with given parameters for creating well separated cluster, average separated clusters and poorly separated clusters. Binary data can only be measured in nominal scale and there are used specific distance measures to transform data to distance matrices. When data transformation is finished, clustering methods can be used. In this work we investigate hierarchical and partitive clustering methods. There are used 10 hierarchical methods with 10 different distance measures. We investigate how the error of certain method depends on these methods and distance measures, on cluster numbers, on different data distributions, on data property vector amount. Partitive clustering methods investigations include popular partitive k-means method. Investigation is about determining the number of clusters and computing how the error of method depends on cluster numbers and different data distributions. For analysis of clustering methods there were created algorithms with statistical analysis system SAS. Also program interface was created for easier way to analyse results. After investigation we saw that some methods perform well with binary data, but some are not very suitable.
Type Master thesis
Language Lithuanian
Publication date 2012