Title Robastinių klastertizavimo metodų plėtojimas /
Translation of Title Development of robust clustering algorithms.
Authors Lukauskas, Mantas
Full Text Download
Pages 222
Keywords [eng] machine learning ; clustering ; unsupervised learning ; density rating ; robust
Abstract [eng] Modern data sets' ever-increasing complexity and scale have highlighted the critical need for effective and efficient data analysis methods. As a result, clustering algorithms have become a key tool in machine learning and data mining to address the challenges of processing and interpreting large amounts of data without any a priori information. In recent years, clustering has found various applications in various fields such as bioinformatics, image processing, natural language processing, social network analysis, and anomaly detection. Robust data clustering is particularly focused on large-scale research in response to the challenges faced by real-world datasets. These include problems such as noise, extreme values, outliers, missing or corrupted data, and the presence of different types and scales of data. Real-world datasets can have complex geometries that make traditional clustering methods fail to show good results. Thus, robust clustering aims to overcome these limitations by using advanced techniques to handle diverse data types and reveal their complex structures. This study aims to develop and investigate efficient data clustering methods compared to other currently available data clustering methods in the case of heterogeneous data. Various methods of probability theory, mathematical statistics, data dimensionality reduction, and visualization methods are applied in the dissertation. The presented clustering methods are based on the inversion formula. Python, R, PostgreSQL, Airflow, dbt, and other packages were used for the software implementation of this work. The results obtained by data clustering methods are characterized by better results than those obtained by other popular methods. The developed methods were applied in the scientific and practical activities of the two companies. Data clustering methods and their application recommendations are used in their research by Farmer et al. (2023), Powroźnik et al. (2022), Yu et al. (2023), and Chen et al.
Dissertation Institution Kauno technologijos universitetas.
Type Doctoral thesis
Language Lithuanian
Publication date 2024