The
Impact of Natural Language Preprocessing on Big Data Sentiment Analysis
Abstract: The
sentiment analysis determines peoples’ opinions, sentiments and emotions by
classifying their written text into positive or negative polarity. The
sentiment analysis is important for many critical applications such as decision
making and products evaluation. Social networks are one of the main sources of
sentiment analysis. However, the huge volume of data produced by social
networks requires efficient and scalable analysis techniques to be applied. The
MapReduce proved its efficiency and scalability in handling big data, thus
attracted many researchers to use the MapReduce as a processing framework. In
this paper, a sentiment analysis method for big data is studied. The method
uses the Naïve Bayes algorithm for classifying texts into positive and negative
polarity. Several linguistic and Natural Language Processing (NLP)preprocessing
techniques are applied on a Twitter data set, to study their impact on the accuracy
of big data classification. The preformed experiments indicates that the accuracy
of the sentiment analysis is enhanced by 5%, yielding an accuracy of 73% on the
Stanford Sentiment data set.
Keywords: Big data,
natural language processing, MapReduce framework, Naïve Bayes and sentiment
analysis.