Assesing the Stability and Selection Performance of Feature Selection Methods Under Different Data Complexity

  • Ghadeer Written by
  • Update: 30/06/2022

Assesing the Stability and Selection Performance of Feature Selection Methods Under Different Data Complexity

Omaimah Al Hosni

School of Engineering, University of Aberdeen,

UK

This email address is being protected from spambots. You need JavaScript enabled to view it.

Andrew Starkey

School of Engineering, University of Aberdeen,

UK

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract: Our study aims to investigate the stability and the selection accuracy of feature selection performance under different data complexity. The motivation behind this investigation is that there are significant contributions in the research community from examining the effect of complex data characteristics such as overlapping classes or non-linearity of the decision boundaries on the classification algorithm's performance; however, relatively few studies have investigated the stability and the selection accuracy of feature selection methods with such data characteristics. Also, this study is interested in investigating the interactive effects of the classes overlapped with other data challenges such as small sample size, high dimensionality associated with irrelevant features, and imbalance classes to provide meaningful insights into the root causes for feature selection methods misdiagnosing the relevant features among different real-world data challenges. This analysis will be extended to real-world data to guide the practitioners and researchers in choosing the correct feature selection methods that are more appropriate for a particular dataset. Our study outcomes indicate that using feature selection techniques with datasets of different characteristics may generate different subsets of features under variations to the training data showing that small sample size and overlapping classes have the highest impact on the stability and selection accuracy of feature selection performance, among other data challenges that have been investigated in this study. Also, in this study, we will provide a survey on the current state of research in the feature selection stability context to highlight the area that requires more attention for other researchers.

Keywords: Stability of feature selection, class overlapping, data characteristics, complex data.

Received April 10, 2022; accepted April 28, 2022

https://doi.org/10.34028/iajit/19/3A/4

Full text

Read 517 times
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…