Using Machine Learning Techniques for Subjectivity
Analysis based on Lexical and Non-Lexical Features
Hikmat Ullah Khan
and Ali Daud
Department of Computer Science, COMSATS Institute of
Information, Pakistan
Abstract: Machine learning techniques have been used
to address various problems and classification of documents is one of the main applications of
such techniques. Opinion mining has emerged as an active research domain due to
its wide range of applications such as multi-document summarization, opinion
mining of documents and users’ reviews analysis improving answers of opinion
questions in forums. Existing works classify the documents using lexicon-based
features only. In this work, four state of the art machine learning techniques have
been applied to classify the content into subjective and objective. The
subjective content contains opinionative information while objective content
contains factual information. The main contribution lies in the introduction of
non-lexical features and content based features in addition to the use of a
conventional lexicon based feature set. We compare results of four machine learning
techniques and discuss performance in
diverse categories of lexical and non-lexical features. The comparative
analysis has been accomplished using standard performance evaluation measures
and experiments have been performed on a real-world dataset of the online forum
related to diverse topics. It has been proven that proposed content and
non-lexical thread specific features play their role in the classification of
subjective and non-subjective content.
Keywords: Machine
Learning, classification, opinion mining, lexicon, non-lexical features.
Received
December 28, 2014; accepted Augest 31, 2015
________________________________________________________________________________________________________________________________________________________________________