Abstract [eng] |
In this paper, the detection of various churn and survival prediction models has been analysed. The detection is based on comparing the wellness of each group of prediction models. Five different detection and survival models were examined – logit, random forest, SVM, Cox proportional – hazards and random survival forest. Almost each were analysed with and without applying a majority down – sampling procedures with various parametric processes of regularization. Down – sampling stands for creation of balanced dataset by matching the number of samples in the majority class with the random sample from the same class. Parametric processes of regularization consists of each model’s variable estimation. An example of the methodology is provided with an experiment based on two datasets. Models were trained and tested by using customer data of two firms from different industries – the telecommunication service provider and extraordinary services of “Premium” club. Research was done by using cross – validation strategy. Results are supplied along with confusion matrixes and various detection measures of models fitness. Detector performance curves such as receiver operating characteristic, detector error trade off, precision – recall and lift plots also supplied. The results shows that the random forest in combination with a balancing dataset procedures outperforms the other detection and survival methods. The outcomes also demonstrates that the ability to identify high risk customers with detection models is significantly better, than survival models for mentioned datasets. Research results indicates that churn models should be used for predicting customer behavior. This research project is done by estimating two different methods for decision threshold – threshold in equal error rate, when sensitivity equals to specificity and threshold which optimizes expected maximum profit. Different thresholds gives different confusion matrixes. Analyzed data also concludes, that improvement in financial profit is not associated with an improvement in the number of churners targeted and optimizing the correctness of detection and survival models. |