Rough Set-Based Reduction of Incomplete Medical Datasets by Reducing the Number of Missing
Values
Luai
Al Shalabi
Faculty of Computer Studies, Arab
Open University, Kuwait
Abstract: This paper proposes a model of: firstly, dimensionality reduction
of noisy medical datasets that based on minimizing the number of missing
values, which achieved by cutting the original dateset, secondly, high quality
of generated reduct. The original dataset was split into two subsets; the first
one contains complete records and the other one contains imputed records that
previously have missing values. The reducts of the two subsets based on rough
set theory are merged. The reduct of the merged attributes was constructed and
tested using Rule Based and Decomposition Tree classifiers. Hepdata dataset,
which has 59% of its tuples with one or more missing values, is mainly used
throughout this article. The proposed algorithm performs effectively and the
results are as expected. The dimension of the reduct generated by the Proposed Model
(PM) is decreased by 10% comparing to the Rough Set Model (RSM). The proposed
model was tested against different medical incomplete datasets. Significant and
insignificant difference between RSM and PM are shown in Tables 1-5.
Keywords: Data mining, rough set theory, missing
values, reduct.