An Anti-Spam Filter Based on One-Class IB Method in Small Training Sets

An Anti-Spam Filter Based on One-Class IB Method in

Small Training Sets

Chen Yang1, Shaofeng Zhao2, Dan Zhang3, and Junxia Ma1

1School of Software Engineering, Zhengzhou University of Light Industry, China

2Henan University of Economics and Law, China

3Geophysical Exploration Center of China Earthquake Administration, China

 

Abstract: We present an approach to email filtering based on one-class Information Bottleneck (IB) method in small training sets. When themes of emails are changing continually, the available training set which is high-relevant to the current theme will be small. Hence, we further show how to estimate the learning algorithm and how to filter the spam in the small training sets. First, In order to preserve classification accuracy and avoid over-fitting while substantially reducing training set size, we consider the learning framework as the solution of one-class centroid only averaged by highly positive emails, and second, we design a simple binary classification model to filters spam by the comparison of similarity between emails and centroids. Experimental results show that in small training sets our method can significantly improve classification accuracy compared with the currently popular methods, such as: Naive Bayes, AdaBoost and SVM.

 

Keywords: IB method, one-class IB, anti-spam filter, Small training sets.

 

Received September 5, 2014; accepted November 25, 2014


Read 1811 times Last modified on Wednesday, 06 March 2019 03:31
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…