Opinion within Opinion: Segmentation Approach for
Urdu Sentiment Analysis
Muhammad Hassan and Muhammad Shoaib
Department of Computer Science and Engineering, University of Engineering and Technology, Pakistan
Abstract:
In computational linguistics, sentiment analysis facilitates classification
of opinion as a positive or a negative
class. Urdu is a widely used language in different parts of the world and
classification of the opinions given in
Urdu language is as important as for any other language. The literature
contains very restricted research for sentiment analysis of Urdu language and
mainly Bag-of-Word model dominates the research methods used for this purpose.
The Bag-of-Word based models fail to
classify a subset of the complex sentiments; the sentiments with more than one
opinion. However, no known literature is
available which identifies and utilizes sub-opinion level information. In this paper, we proposed a method based on sub-opinions
within the text to determine the overall polarity of the sentiment in Urdu
language text. The proposed method
classifies a sentiment in three steps, First it segments the sentiment into two
fragments using a set of hypotheses. Next it calculates the orientation scores
of these fragments independently and finally estimates the polarity of the
sentiment using scores of the fragments. We developed a computational model
that empirically evaluated the proposed method. The proposed method increases
the precision by 8.46%, recall by 37.25% and accuracy by 24.75%, which is a
significant improvement over the existing techniques based on Bag-of-Word
model.
Keywords: Sentiment
analysis, urdu natural language processing, social media mining, urdu discourse
analysis.
Received December 7, 2014; accept January 20, 2016
|