A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Le

A Novel Feature Selection Method Based on

Maximum Likelihood Logistic Regression

for Imbalanced Learning in Software Defect

Prediction

Kamal Bashir1, Tianrui Li1, and Mahama Yahaya2

1School of Information Science and Technology, Southwest Jiaotong University, China

2School of Transport and Logistics Engineering, Southwest Jiaotong University, China

Abstract: The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process.

Keywords: Software defect prediction· Machine learning· Class imbalance· Maximum-likelihood logistic regression.

Received April 30, 2018; accepted January 28, 2020

https://doi.org/10.34028/iajit/17/5/5

Full Text     

 

Read 2982 times Last modified on Wednesday, 26 August 2020 05:54
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…