Abstract [eng] |
Tax evasion is a pressing problem, which is analysed in scientific publications and researches. Because of unfulfilled obligations by some taxpayer less income is collected to the state budget, which affects all taxpayers. Research shows that Lithuania has a large shadow economy and high tax evasion rates. In order to reduce the gap and identify the causes of tax evasion, preventative and control measures are taken. The work described solves a pressing issue. The objective of the work is to create a model that would help identify risky legal entities using big data available to tax authorities and to automate this process in order to plan control measures and more effectively collect taxes. Based on the experience of other nations and scientific publications, actions were identified that influence legal entity tax evasion, a list of features (variables) that describe risky taxpayers was created and practical tax evasion identification methods were analysed. Using the list of features, significant feature selection and classification methods (random forrest, neural network, nearest neighbour, extreme gradient and adaptive strengthening) were used to create a legal entity tax evasion identification model, which was applied in practise analysing anonymised legal entity data from the 2011 - 2015 year period. To solve unbalanced training set problem these modifications were made: increase, decrease and additional entries were generated. It was found that RFE method is the best in finding significant features. Analysing tax payers the best classification method was the extreme gradient support method. |