An Effective Sample Preparation Method for
Diabetes Prediction
Shima Afzali and Oktay
Yildiz
Computer Engineering Department, Gazi University, Turkey
Abstract: Diabetes is a chronic disorder caused by metabolic
malfunction in carbohydrate metabolism and it has become a serious health
problem worldwide. Early and correct detection of diabetes can significantly
influence the treatment process of diabetic patients and thus eliminate the
associated side effects. Machine learning is an emerging field of high importance
for providing prognosis and a deeper understanding of the classification of
diseases such as diabetes. This study proposed a high precision diagnostic
system by modifying k-means clustering technique. In the first place, noisy,
uncertain and inconsistent data was detected by new clustering method and
removed from data set. Then, diabetes prediction model was generated by using
Support Vector Machine (SVM). Employing the proposed diagnostic system to
classify Pima Indians Diabetes data set (PID) resulted in 99.64% classification
accuracy with 10-fold cross validation. The results from our analysis show the
new system is highly successful compared to SVM and the classical k-means
algorithm & SVM regarding classification performance and time consumption. Experimental
results indicate that the proposed approach outperforms previous methods.
Keywords: Diabetes, clustering, classification,
K-means, SVM, sample preparation.