Analyzing the Behavior of Multiple Dimensionality Reduction Algorithms to Obtain Better Accuracy using Benchmark KDD CUP Dataset
Suriya Prakash Jambunathan1, Suguna Ramadass2, and Palanivel Rajan Selva kumaran3
1Faculty of Information and Communication Engineering, Anna University, India
2Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, India
3Department of Electronics and Communication Engineering, M. Kumarasamy College of Engineering, India
Abstract: In the ubiquitously connected world of IT infrastructure, Intrusion Detection System (IDS) plays vital role. IDS is considered as a critical component of security infrastructure and is implemented either through hardware or software devices and can detect malicious activities in a networked environment. To detect or prevent network attacks, Network Intrusion Detection (NID) system may be equipped with machine learning algorithms to achieve better accuracy and faster detection speed. Analyzing different attacks effectively through Dimensionality Reduction Algorithms is an efficient mechanism. The significance of these algorithms is they improvise feature selection from huge datasets. Also through this the learning speed is enhanced. Speed is a crucial parameter in the success of network intrusion detection systems for defending reactions. In this paper open source datasets Knowledge Discovery in Databases (KDD CUP) dataset and 10% KDD CUP dataset are employed for experimentation. These datasets are provided to Dimensionality Reduction Algorithms like Principal Component Analysis (PCA), Linear Discriminate Analysis (LDA) and Kernel PCA with different kernels and classified with Logistic Regression classification algorithm for procuring accurate results. Further to boost up the accuracy achieved so far K-fold algorithm is utilized. Finally a comparative study of different accuracy results is done by using K-fold algorithm and also without the usage of this algorithm. The empirical study on KDD CUP data confirms the effectiveness of the proposed scheme. In this paper we discovered the combination of multiple dimensionality reduction algorithm such as PCA , LDA and Kernel PCA with classification algorithm and this combination of algorithm gives best result. Our study will help out the researchers to uncover critical area such as intrusion detection in network traffic environment. The results what we identified will be very much helpful for researchers for their future research on KDD CUP dataset. In this the new theory will be arrived by this research that the best accuracy achieved by PCA with 10% KDD CUP dataset experimental results without KFold attained 98% and with KFold attained 99%. LDA with 10% KDD CUP Dataset experimental results without KFold attained 98% and with KFold attained 99%.
Keywords: Intrusion attacks, network, features, accuracy.
Received December 14, 2020; accepted August 17, 2021