Abstract [eng] |
Due to an unprecedented increase in data worldwide, the financial sector and other industries and businesses are struggling to remain competitive by transforming themselves into data-driven organizations. By analyzing large numbers of data, organizations can gain valuable information to determine their strategic plans, such as risk control, crisis management, or growth management. For example, one of the biggest problems for banks is determining the creditworthiness of bank consumers and whether these customers will repay the loans granted to them on time and at all times. At present, banks have a huge number of data that can be used to construct models to predict this. Such forecasting allows for much faster data analysis and does not require expert judgment, which is much more expensive in this case. This presentation will provide information about different machine learning algorithms and their use in clustering banks customers and subsequent classification. The study uses SEB Big Data Challenge data with which the analysis was performed. In total, this dataset contains data from more than ten million customers. About twenty different indicators/characteristics describe all observations/clients. During the first stage of the study, data processing was performed: filling in the missing values with different methods, balancing the data set, and comparing different normalization algorithms. In the second stage of this study, data clustering (k-means, DBSCAN, OPTICS, etc.) was performed, based on which customer classification models were later developed. Different classification algorithms (artificial neural networks, XGBoost, LightGBM, Catboost, etc.) were used in the study. A comparison was made with the algorithms' reliability, accuracy, and computation time to determine which algorithms were the best. The results of this study allow comparing the influence of different data processing techniques, different clustering algorithms, and different classification algorithms on the accuracy of customer classification. |