Abstract [eng] |
Prostate cancer is one of the most common type of men cancer all around the world, including Lithuania. According to World Health Organization 2020 report, prostate cancer was the 4th most common type of cancer and the 2nd between men while by the mortality cases it was the 8th and 5th between men. In Lithuania, the prostate cancer is the most common type of cancer and is the 4th by the mortality incidences and the 2nd between men. This is why it is important for a healthcare professional to distinguish fatal and non-fatal patient’s prostate cancer, this can be done with the help of a machine learning model, which we will try to implement in this work using the data collected in Kaunas Clinics. For mortality risk estimation 4 machine learning models have been created: logistic regression, random forest, XGBoost and neural network. Models were trained for 4 different response variables: cancer specific mortality, death from other causes, biochemical recurrence and metastases. These models have been trained on randomly sampled training set consisting of 1251 observations, models were evaluated on testing set consisting of 313 patients. Dataset have been transformed from continuous time to discrete time data. The hyperparameters of models were found with the use of 5-fold cross validation within training set applying Bayesian optimization method. Optimal models were selected for each response variable. We have obtained that random forest model showed the best AUC value on testing set comparing to other 3 methods on 3 targets. In case of cancer specific mortality, training and testing set average AUC values are respectively 0.951 (SD = 0.037) and 0.928 (SD = 0.045), death from other causes respectively 0.663 (SD = 0.049) and 0.689 (SD = 0.046), biochemical recurrence respectively 0.865 (SD = 0.030) and 0.855 (SD = 0.034). In case of metastases, optimal model was found to be XGBoost, training and testing set average AUC values respectively 0.997 (SD = 0.005) and 0.927 (SD = 0.035). |