Phishing Detection using RDF and Random Forests
Vamsee Muppavarapu, Archanaa Rajendran, and Shriram
Vasudevan
Department
of Computer Science and Engineering, Amrita Vishwa Vidyapeetham University,
India
Abstract: Phishing
is one of the major threats in this internet era. Phishing is a smart process
where a legitimate website is cloned and victims are lured to the fake website
to provide their personal as well as confidential information, sometimes it
proves to be costly. Though most of the websites will give a disclaimer warning
to the users about phishing, users tend to neglect it. It is not a fully
responsible action by the websites also and there is not much that the websites
could really do about it. Since phishing has been in persistence for a long
time, many approaches have been proposed in past that can detect phishing
websites but very few or none of them detect the target websites for these
phishing attacks, accurately. Our proposed method is novel and an extension to
our previous work, where we identify phishing websites using a combined
approach by constructing Resource Description Framework (RDF) models and using
ensemble learning algorithms for the classification of websites. Our approach
uses supervised learning techniques to train our system. This approach has a
promising true positive rate of 98.8%, which is definitely appreciable. As we
have used random forest classifier that can handle missing values in dataset,
we were able to reduce the false positive rate of the system to an extent of
1.5%. As our system explores the strength of RDF and ensemble learning methods
and both these approaches work hand in hand, a highly promising accuracy rate
of 98.68% is achieved.
Keywords: Phishing, ensemble learning, RDF models, phishing
target, metadata, vocabulary, random forests.
Received April 22, 2015; accepted September 20, 2015