Rough Set-Based Reduction of Incomplete Medical Datasets by Reducing the Number of Missing Values

Rough Set-Based Reduction of Incomplete Medical Datasets by Reducing the Number of Missing Values

Luai Al Shalabi

Faculty of Computer Studies, Arab Open University, Kuwait

Abstract: This paper proposes a model of: firstly, dimensionality reduction of noisy medical datasets that based on minimizing the number of missing values, which achieved by cutting the original dateset, secondly, high quality of generated reduct. The original dataset was split into two subsets; the first one contains complete records and the other one contains imputed records that previously have missing values. The reducts of the two subsets based on rough set theory are merged. The reduct of the merged attributes was constructed and tested using Rule Based and Decomposition Tree classifiers. Hepdata dataset, which has 59% of its tuples with one or more missing values, is mainly used throughout this article. The proposed algorithm performs effectively and the results are as expected. The dimension of the reduct generated by the Proposed Model (PM) is decreased by 10% comparing to the Rough Set Model (RSM). The proposed model was tested against different medical incomplete datasets. Significant and insignificant difference between RSM and PM are shown in Tables 1-5.

Keywords: Data mining, rough set theory, missing values, reduct.

Received October 9, 2015; accepted August 24, 2016
Full text     
Read 1462 times
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…