A Statistical Framework for Identification of Tunnelled Applications using Machine Learning
Ghulam Mujtaba1 and David Parish2
1Department of Electrical Engineering, Comsats Institute of Information Technology, Pakistan
2School of Electronic and Engineering, Loughborough University, UK
Abstract: This work describes a statistical approach to detect applications which are running inside application layer tunnels. Application layer tunnels are a significant threat for network abuse and violation of acceptable internet usage policy of an organisation. In tunnelling, the prohibited application packets are encapsulated as payload of an allowed protocol packet. It is much difficult to identify tunnelling using conventional methods in the case of encrypted HTTPS tunnels, for example. Hence, machine learning based approach is presented in this work in which statistical packet stream features are used to identify the application inside a tunnel. Packet Size Distribution (PSD) in the form of discrete bins is an important feature which is shown to be indicative of the respective application. This work presents a combination of other features with the PSD bins for better identification of the applications. Tunnelled applications are identifiable using these traffic statistical parameters. A comparison of the performance accuracy of five machine learning algorithms for application detection using this feature set is also given.
Keywords: Network security, tunnelled applications, firewalls, HTTP tunnels, HTTPS tunnels.
Received May 22, 2013; Accepted May 17, 2015