A data mining methodology with preprocessing steps

Vita Špečkauskienė; Arūnas Lukoševičius

Title	A data mining methodology with preprocessing steps
Another Title	Duomenų gavybos ir analizės metodika apimanti, pirminio apdorojimo veiksmus.
Authors	Špečkauskienė, Vita ; Lukoševičius, Arūnas
Full Text
Is Part of	Informacinės technologijos ir valdymas = Information technology and control.. Kaunas : Technologija. 2009, t. 38, Nr. 4, p. 319-324.. ISSN 1392-124X. eISSN 2335-884X
Keywords [eng]	Feature selection ; Optimal data set ; Data set quality ; Data mining ; Classification ; Clinical decision support
Abstract [eng]	This paper analyzes various problems that appear while performing data mining. The issues of data quality are discussed. The main focus is set on feature selection and its influence on classification results. Feature selection, or discovery of an optimal data set is a process of removing features from the data set that are not useful in decision making, and leaving the most useful ones. The influence of feature selection is analyzed for different classification algorithms. They are applied on two different (in constitution) data sets to solve three problems of medical domain. Presented results show that there is no universal algorithm, whitch could help solving any problem, as well as each data set has its own optimal (sub)set suitable for the classification algorithm. Methodological recommendations to reach possibly optimal solution are given to perform clinical decision support.
Published	Kaunas : Technologija
Type	Journal article
Language	English
Publication date	2009
CC license

„A data mining methodology with preprocessing steps“