Feature Selection Algorithm Based on Correlation between Muti Metric Network Traffic Flow
Features
Yongfeng Cui1,2, Shi Dong1,2,3, and Wei Liu2
1School of Computer Science and Technology, Huazhong Universtiy of Science and Technology, China
2School of Computer Science and Technology, Zhoukou Normal University, China
3Department of Computer Science and Engineering, Washington University in St Louis, USA
Abstract: Traffic identification is a hot issue
in recent years, in order to overcome shortcomings of port-based and Deep Packet
Inspection (DPI), machine learning algorithm has gained wide attention,
but nowadays research focus on traffic identification based on full packets
dataset, which would be a great challenge to identify online traffic flow. It
is a way to overcome this shortcoming by considering the sampled flow records
as identification object. In this paper, flow records NOC_SET is constructed as
dataset, and inherent NETFLOW and extended flow metrics are regarded as features.
This paper proposes feature selection algorithm MSAS to select
features with high correlation. And classical machine learning algorithms are
used to identify traffic. Experimental results show that machine learning flow
identification algorithm based on sampled flow records has almost the same
identification results as method based on full packets dataset, and the
proposed feature selection algorithm MSAS can improve the result of application
identification.
Keywords: Port identification, deep packet inspection, netflow flow, machine learning.
Received Febrauary 5, 2014; accepted April 2, 2015