Maximum Spanning Tree Based Redundancy
Elimination for Feature Selection of High
Dimensional Data
Bharat Singh and Om Prakash Vyas
Department of Information Technology, Indian Institute of
Information Technology-Allahabad, India
Abstract: Feature selection adheres to the phenomena
of preprocessing step for High Dimensional data to obtain optimal results with
reference of speed and time. It is a technique by which most prominent features
can be selected from a set of features that are prone to contain redundant and
relevant features. It also helps to lighten the burden on classification
techniques, thus makes it faster and efficient.We introduce a novel two tiered
architecture of feature selection that can able to filter relevant as well as
redundant features. Our approach utilizes the peculiar advantage of identifying
highly correlated nodes in a tree. More specifically, the reduced dataset
comprises of these selected features. Finally, the reduced dataset is tested
with various classification techniques to evaluate their performance. To prove
its correctness we have used many basic algorithms of classification to
highlight the benefits of our approach. In this journey of work we have used
benchmark datasets to prove the worthiness of our approach.
Keywords: Data mining, feature selection, tree based
approaches, maximum spanning tree, high dimensional data.
Received
February 15, 2015; accepted December 21, 2015