Veri madenciliğine genel bakış ve Random Forests yönteminin incelenmesi: Sağlık alanında bir uygulama
No Thumbnail Available
Files
Date
2010
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Sağlık Bilimleri Enstitüsü
Abstract
Data Mining is processed in order to help policy makers for giving valid and efficient decisions using the available data on the subject. In general, data mining has descriptive and predictive perspectives. In medicine, especially its predictive aspects are used.Within this thesis study, data mining techniques are introduced briefly. Further, decision trees, part of classification models, which has an important place in data mining are explained. Also, tree-based data mining method Random Forests (RF) is analyzed and applied on periodontology data set.In RF method, decision trees which form decision forest are created with different data sets. These data sets are bootstrapped samples from original data set. Also each decision tree is created with less randomly selected parameters from all of the predictors. Each decision tree votes for one class and forest aggregates votes from all trees, and makes final decision for the class. Using these properties RF gives fairly good results.Using RF method, 95,4 % of successful classification rate is achieved. Decision Forest?s error rate was found 3,33 % . Classification was made by Bagging method and CART method for the same data set and the error rates were found 5,4 % and 8,75 % respectively.Using RF method, even there exists many predictors and large amount of data, generally lower error rate of classification is achieved. As RF is an ensemble method it gives better results. It can be used for determining important ones from large amount of DNA data set which has thousands of predictors(genes)
Description
Keywords
Veri madenciliği, Sağlık