1 Abstract
Sentiment Analysis is an automated process of interpreting an opinion about a subject in the text or verbal form. The Bag-of-Words (BOW) model although a simple and efficient model for modeling textual data in sentiment analysis, always gives inaccurate predictions due to some fundamental deficiencies in the circumstance of polarity shift. The BOW model considers two sentiment-opposite texts to be very similar. This model disrupts the word order, breaks the syntactic structures and discards some semantic information. To address the issue of the Bag-of-Words model unable to handle the polarity shift problem and inadvertently leading to the failure of statistical machine learning algorithms, a coupled sentiment analysis is developed where a dataset is taken and its sentiment-reversed review dataset is created for each training and testing review. Data collected in this project is a cellphone reviews dataset from “amazon.com”. From the results, after performing coupled sentiment analysis on the dataset, we observe an improved accuracy in predicting the review sentiment in comparison to predicting the sentiment using sentiment analysis alone.
2 Introduction
Sentiment is a perspective, and the process of sentiment analysis is the study of individual’s perception towards specific entities. Users post their content using the medium of social media through social networking sites, forums, and online journals. On the other hand, researchers and developers collect data for analysis from application programming interfaces (APIs) made available by social media sites. Data available online is faulty and doesn’t convey crisp information, which often creates difficulty and obstructs the process of sentiment analysis. The quality of comments posted online cannot be trusted since everyone has the liberty to post online. Hence, it’s essential to process and analyze textual data for sentiment analysis to determine the orientation of the sentiments of online comments. Product reviews available online are significantly utilized in mining opinions as customers rely heavily on learning the sentiments indicated in the text (Pak, A., & Paroubek, P. 2010, May). With the growing popularity and availability of user-generated textual data such as online review sites and personal blogs which are heavily loaded with opinions, user-generated textual data has significantly increased corresponding to the need for efficient techniques for analyzing all that data. The concept of sentiment analysis or sentiment classification has been on the rise since 2000 (Liu, B. 2012), where its goal is to evaluate the text in accordance to the sentimental polarities of the users’ thoughts or opinions, e.g., positive or negative which is generally present in the form of unstructured data. Data mining tools and algorithms are utilized to discover and analyze the sentiments and attitudes of consumer behavior on products they have purchased or want to buy (Jack, L., & Tsai, 2015). Sentiment analysis or opinion mining comprises a wide range of fields like natural language processing, decision making, and linguistics. It’s a type of text analysis that is used for classifying the text that enables decision making by differentiating between the products and featuring client priorities on specific items. In statistical machine learning techniques, the bag-of-words (BOW) algorithm is the most broadly utilized approach for representing text data in sentiment classification. Commonly also known as the vector space model, the BOW model is an order less collection of words without assessing the grammar or word order but keeping assortment. Despite being the most generally utilized system in topic-based text classification, the BOW algorithm has a few fundamental deficiencies such as the assessing a lexicon manually to evaluate words; it analyzes sentiments with a lower precision thus ignoring the semantics of words and disrupting the word order and grammar (El-Din, D. M. 2016). This inadequacy in the bag-of-words algorithm is unable to resolve the polarity shift problem. Being a sentiment classification problem, the polarity shift reverses the sentiment polarity of the review’s text, i.e., negative to positive or positive to negative. Certain polarity shifters also known as valence shifters such as negation expressions have the capability of shifting the sentiment polarity of the text entirely (Ikeda, D., Takamura, H., Ratinov, L. A., & Okumura, M. 2008). Once negation words such as “no”, “don’t” or “not” are added in front of a positive text, the orientation and sentiment of that text is reversed from positive to negative and the text changes entirely. For instance, adding a negation trigger word “not” in front of a positive sentence “The phone’s camera is good.” reverses the sentiment of this sentence from positive to negative. The BOW model considers these inversed sentiments texts to be fundamentally the same which majorly impacts the performance of machine learning based systems to fail under the polarity shift circumstances. Therefore, there is a need to handle this polarity shift problem as it is imperative to improvise on the performance and execution of machine learning models to give more accurate results.
Get Free Quote!
272 Experts Online