Mahalanobis Distance-the Ultimate Measure for Sentiment Analysis
Valarmathi Balasubramanian1, Srinivasa Nagarajan2 and Palanisamy Veerappagoundar3
1Faculty of Soft Computing Division, VIT University, India
2Faculty of Manufacturing Division, VIT University, India
3Info Institute of Engineering, India
Abstract: In this paper, Mahalanobis Distance (MD) has been proposed as a measure to classify the sentiment expressed in a review document as either positive or negative. A new method for representing the text documents using Representative Terms (RT) has been used. The new way of representing text documents using few representative dimensions is relatively a new concept, which is successfully demonstrated in this paper. The MD based classifier performed with 70.8% of accuracy for the experiments carried out using the benchmark dataset containing 25000 movie reviews. The hybrid of Mahalanobis Distance based Classifier (MDC) and Multi Layer Perceptron (MLP) resulted in a 98.8% of classification accuracy, which is the highest ever reported accuracy for a dataset containing 25000 reviews.
Keywords: Sentiment analysis, MD, opinion mining, machine learning algorithms, hybrid classifier.
Received August 20, 2012; accepted September 26, 2013