Opinion within Opinion: Segmentation Approach for Urdu Sentiment Analysis

Opinion within Opinion: Segmentation Approach for

Urdu Sentiment Analysis

Muhammad Hassan and Muhammad Shoaib

Department of Computer Science and Engineering, University of Engineering and Technology, Pakistan

Abstract: In computational linguistics, sentiment analysis facilitates classification of opinion as a positive or a negative class. Urdu is a widely used language in different parts of the world and classification of the opinions given in Urdu language is as important as for any other language. The literature contains very restricted research for sentiment analysis of Urdu language and mainly Bag-of-Word model dominates the research methods used for this purpose. The Bag-of-Word based models fail to classify a subset of the complex sentiments; the sentiments with more than one opinion. However, no known literature is available which identifies and utilizes sub-opinion level information. In this paper, we proposed a method based on sub-opinions within the text to determine the overall polarity of the sentiment in Urdu language text. The proposed method classifies a sentiment in three steps, First it segments the sentiment into two fragments using a set of hypotheses. Next it calculates the orientation scores of these fragments independently and finally estimates the polarity of the sentiment using scores of the fragments. We developed a computational model that empirically evaluated the proposed method. The proposed method increases the precision by 8.46%, recall by 37.25% and accuracy by 24.75%, which is a significant improvement over the existing techniques based on Bag-of-Word model.

Keywords: Sentiment analysis, urdu natural language processing, social media mining, urdu discourse analysis.

Received December 7, 2014; accept January 20, 2016

 
Read 2411 times Last modified on Sunday, 20 May 2018 04:52
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…