In the last decade, sentiment analysis (SA), also known as opinion mining, has attracted an increasing interest. It is a hard challenge for language technologies, and achieving good results is much more difficult than some people think. The task of automatically classifying a text written in a natural language into a positive or negative feeling, opinion or subjectivity (Pang and Lee, 2008), is sometimes so complicated that even different human annotators disagree on the classification to be assigned to a given text. Opinion mining is a powerful tool you can use to build smarter products. It’s a natural language processing algorithm that gives you a general idea about the positive, neutral, and negative sentiment of texts. Social media monitoring apps and companies all rely on sentiment analysis and machine learning to assist them in gaining insights about mentions, brands, and products. An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object 2.
1.1 General Overview
1.1.1 Sentiment Analysis:
Sentiment classification is a technique to focus on the sentiments or opinions expressed in an article or conveyed orally. The term sentiment includes emotions, conclusions, behavior and others. In this report, we concentrate on human readable text writing on the e-commerce sites.
1.1.2 Opinion Mining:
Opinion mining involves analyzing opinions, sentiments or mentality of the writer from the written text. Opinion mining uses the concepts of NLP, data mining and machine learning to perform this task. This section involves analyzing requirement for opinion mining. In the next segments, we concentrate on sentiment mining assignments and present a review.
1.1.3 Why opinion mining:
Online opinions have indirect influence on the business of several e-commerce sites. Those sites market their products and the web users go through the reviews of the product before buying that product. Many organizations utilize opinion mining systems to track customer reviews of products sold online. Opinion mining is an incredible way of maintaining focus on several business trends related to deals administration, status management and also advertising. Pattern prediction is also done using the opinion of the customers.
We have chosen to work with twitter since we feel it is a better approximation of public sentiment as opposed to conventional internet articles and web blogs3. The reason is that the amount of relevant data is much larger for twitter, as compared to traditional blogging sites. Moreover the response on twitter is more prompt and also more general (since the number of users who tweet is substantially more than those who write web blogs on a daily basis). Sentiment analysis of public is highly critical in macro-scale socioeconomic phenomena like predicting the stock market rate of a particular firm. This could be done by analysing overall public sentiment towards that firm with respect to time and using economics tools for finding the correlation between public sentiment and the firms stock market value. Firms can also estimate how well their product is responding in the market, which areas of the market is it having a favourable response and in which a negative response (since twitter allows us to download stream of geo-tagged tweets for particular locations. If firms can get this information they can analyze the reasons behind geographically differentiated response, and so they can market their product in a more optimized manner by looking for appropriate solutions like creating suitable market segments. Predicting the results of popular political elections and polls is also an emerging application to sentiment analysis. One such study was conducted by Tumasjan et al. in Germany for predicting the outcome of federal elections in which concluded that twitter is a good reflection of offline sentiment 4.
1.3 Domain Introduction
This project of analyzing sentiments of tweets comes under the domain of Pattern Classification and Data Mining. Both of these terms are very closely related and intertwined, and they can be formally defined as the process of discovering useful patterns in large set of data, either automatically (unsupervised) or semi-automatically (supervised). The project would heavily rely on techniques of Natural Language Processing in extracting significant patterns and features from the large data set of tweets and on Machine Learning techniques for accurately classifying individual unlabelled data samples (tweets) according to whichever pattern model best describes them. For example the word excellent has a strong positive connotation while the word evil possesses a strong negative connotation. So whenever a word with positive connotation is used in a sentence, chances are that the entire sentence would be expressing a positive sentiment 5. Parts of Speech tagging, on the other hand, is a syntactical approach to the problem. It means to automatically identify which part of speech each individual word of a sentence belongs to: noun, pronoun, adverb, adjective, verb, interjection, etc. Patterns can be extracted from analyzing the frequency distribution of these parts of speech (ether individually or collectively with some other part of speech) in a particular class of labeled tweets. Twitter based features are more informal and relate with how people express themselves on online social platforms and compress their sentiments in the limited space of 140 characters offered by twitter. They include twitter hash tags, re-tweets, word capitalization, word lengthening, question marks, presence of url in tweets, exclamation marks, internet emoticons and internet slangs. Classification techniques can also be divided into a two categories: Supervised vs. unsupervised and non-adaptive vs. adaptive/reinforcement techniques. Supervised approach is when we have pre-labeled data samples available and we use them to train our classifier. Training the classifier means to use the pre-labeled to extract features that best model the patterns and differences between each of the individual classes, and then classifying an unlabeled data sample according to whichever pattern best describes it. For example if we come up with a highly simplified model that neutral tweets contain 0.3 exclamation marks per tweet on average while sentiment-bearing tweets contain 0.8, and if the tweet we have to classify does contain 1 exclamation mark then (ignoring all other possible features) the tweet would be classified as subjective, since 1 exclamation mark is closer to the model of 0.8 exclamation marks. Unsupervised classification is when we do not have any labeled data for training. In addition to this adaptive classification techniques deal with feedback from the environment. There are several metrics proposed for computing and comparing the results of our experiments. Some of the most popular metrics include: Precision, Recall, Accuracy, F1-measure, True rate and False alarm rate (each of these metrics is calculated individually for each class and then averaged for the overall classifier performance.) A typical confusion table for our problem is given below along with illustration of how to compute our required metric 6.
Machine says yes Machine says no Human says yes TP FN Human says no FP TN Table 1.1: A Typical 2×2 Confusion Matrix Prescision(P): TPTP+FP
F1: 2PRP+RTrue Rate(T): TPTP+FNFalse-alarm(F): FPTP+TNCHAPTER 2
This thesis focuses on sentiment analysis, including sentiment classification and sentiment retrieval. This has been a heated field of research over the past two decades, attracting computer scientists, computational linguists, psychologists and even sociologists. In this chapter we first introduce the core concepts and techniques in sentiment analysis, which are referred to later in this thesis 7; we then discuss recent developments of sentiment analysis dedicated to social media. We have explored approaches to improving sentiment analysis by resolving topic dependency problem and reducing topic-irrelevant or factual text. As such we discuss relevant work to topic-sentiment interactions for sentiment analysis in the third and fourth sections. We discuss other pieces of relevant work to this thesis in the last section.
2.1.1 Sentiment Analysis in General
In general, sentiment analysis deals with the computational treatment of (in alphabetical order) opinion, sentiment, and subjectivity in text Pang and Lee, 2008. The notion of sentiment, in the context of this thesis, refers to opinions and subjectivity. Subjectivity was first defined by linguists. A prominent study by Quirk et al. 1985 defined private state as something not open to objective observation or verification, which included emotions, opinions and other subjective content. This definition of private state was used the seminal work of Wiebe 1994, who defined the private state as a tuple (p, experiencer, attitude, object), representing an experiencers state p to his/her attitude toward an object. Most other studies did not adopt such a strict definition, but used a simplified model that only concerned the polarity of the sentiment and the target of the sentiment Hu and Liu, 2004; Pang et al., 2002. In the TREC Blog Track and the Microblog Track, sentiment analysis tasks are done in the context of information retrieval, therefore sentiment polarities are the target of classification, and are annotated in the context of specific topics or queries. 2.1.2 What is Sentiment Analysis? As is introduced previously, the term sentiment analysis can refer to either sentiment retrieval or sentiment classification. The former aims to retrieve the subjective content of interest. At the document level 8, a typical example is subjective summarization , which retrieves the opinionated portion of a document, and often with respect to a user-given query. Sentiment retrieval can also take place at collection level, to retrieve opinionated documents from a collection. More recent examples include aspect-based opinion summarization , which oper- ates at phrase level or sentence level to retrieve opinions towards aspects or product features. This branch of sentiment analysis has gain increasing popularity over the past years, partially thanks to the rapid development of topic modeling techniques. Sentiment classification is focused on evaluation and prediction of the sentiment polarities (positive, negative, etc) of a piece of text (a word, a phrase, a sentence, a document, or a list of documents). Another term for polarity is semantic orientation Turney, 2002. Earlier works of SA are mostly related to sentiment classification, as the polarity is a natural abstraction of collective opinion. Most studies classify comments or documents into two categories: positive versus negative. This is also referred to as binary polarity classification. Koppel and Schler 2006 argued that there were other comments that might express a mixed or neutral sentiment and proposed to do Multi-class sentiment classification. Their study has shown that by incorporating neutral category can lead to significant improvement in overall classification accuracy, and this is achieved by properly combining pair wise classifiers. Multi- class sentiment classification is also done in TREC Blog tasks. Aside from binary and multi- class sentiment classification, there are work that focus on rating inference, which attempts to classify the authors opinion with respect to a multi-point scale (e.g. star ratings), but this is beyond the scope of this thesis. We use the following example to further explain the tasks of sentiment analysis 9. The excerpt is from a blog post in the TREC Blog08 Collection, under the topic 1032. The query for this topic is What is the opinion of the movie I Walk the Line, and this post is annotated as positive, which means it contains positive opinion towards the topic. I never knew much about Johnny Cash; I didn’t really know what to expect out of this movie.
I don’t even know if Walk The Line was an accurate depiction of Cashs life, but I dont really care either. Walk The Line was an incredible movie, an easy 5 out of 5… With witty dialog and a superb story, this movie was a lot of fun. The acting was better than good. At times, I felt as if Cashs dead ghost was possessing Joaquins body. Hey, if youre on the edge about this movie, just remember one thing: Johnny Cash recorded an album in a freaking prison. Now, thats cool.
In a sentiment classification task, the aim is to automatically predict the polarity of the document (positive). In a sentiment retrieval task, the target may be retrieving the opinions relevant to the topic of interest. The majority of the sentences in the except above are indeed expressing opinions on the topic, and have commented on multiple aspects of the movie including the dialog, story, etc. Aspect-based summarization may produce the following output,
Acting better than good etc…
Combining aspect-opinion pairs mined from a collection of documents, it is possible to mine richer opinion-related information than polarities. That said, aspect-based summarisation is typically applied to formal product reviews but not informal social media text, as the latter is much less structured, and the latent relationship between opinion and target is often expressed in an inexplicit way10.
2.2 Types of Approach
2.2.1 Text Mining Process
Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
Text analysis involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods.
A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted.
2.2.2 Lexicon-Based Approach
The lexicon based approach is based on the assumption that the contextual sentiment orientation is the sum of the sentiment orientation of each word or phrase. Turney (2002) identifies sentiments based on the semantic orientation of reviews. (Taboada et al., 2011; Melville et al., 2011; Ding et al., 2008) use lexicon based approach to extract sentiments 11. Sentiment Analysis on micro blogs is more challenging compared to longer discourses like reviews. Major challenges for micro blog sentiment analysis are short length status message, informal words, word shortening, spelling variation and emoticons. Sentiment Analysis on Twitter data have been researched by (Bifet and Frank, 2010; Bermingham and Smeaton, 2010; Pak and Paroubek, 2010). We use our lexicon based approach to extract sentiments. The open lexicon such as Sentiwordnet (Esuli and Sebastiani, 2006; Baccianella et al., 2010), Q-wordnet (Agerri and Garca-Serrano, 2010), WordNet-Affect (Strapparava and Valitutti, 2004) are developed for supporting Sentiment Analysis. Studies have been made on preprocessing tweets. Han and Baldwin (2011) used a classifier to detect word variation and match the related word. Kaufmann and Kalita (2010) gives the full preprocessing approach to convert a tweet to normal text. Sentiment Analysis on Twitter data is not confined to raw text. Analyzing Emoticons have been an interesting study. Go et al. (2009) used emoticons to classify the tweets as positive or negative and train standard classifiers such as Naive Bayes, Maximum Entropy, and Support Vector Machines. Hash tag may have some sentiment in it. Davidov et al. (2010) used 50 hash tags and 15 emoticons as sentiment labels for classification to allow diverse sentiment types for the tweet. Negation and intensifier play an important role in Sentiment Analysis. Negation word can reverse the polarity, where as intensifier increases sentiment strength. Taboada et al. (2011) studied role of the intensifier and negation in the lexicon based Sentiment Analysis. Wiegand et al. (2010) survey the role of negation in Sentiment Analysis12.
2.2.3 Machine Learning Approach
Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article 13,14,15 investigates the role of active learning techniques for reducing the number of examples to be manually annotated.
We would be basically dealing with two main algorithms in this approach as stated below:
Naive Bayes Algorithm.
Long Sort Term Memory Algorithm.
TEXT MINING PROCESS
Text Mining is the automated process of detecting and revealing new, uncovered knowledge and inter-relationships and patterns in unstructured textual data resources 16,17. Text mining targets un-discovered knowledge in huge amounts of text. Whereas, search engines and Information Retrieval (IR) systems have specific search target such as search query or keywords and return related documents. This research field utilizes data mining algorithms, such as classification, clustering, association rules, and many more in exploring and discovering new information and relationships in textual sources. It is an inter-disciplinary research field combining information retrieval, data mining, machine learning, statistics and computational linguistics. Figure 1, summarizes the text mining process. Firstly, a set of un-structured text documents is collected. Then, the pre-processing for the documents is performed to remove noise and commonly used words, stop words, stemming. This process produces a structured representation of the documents known as Term document matrix, in which, every column represents a document and every row represents a term occurrence throughout the document. The final step is applying data mining techniques such as clustering, classification, association rules to discover term associations and patterns in the text and then, finally, visualizing these patterns using tools such as word-cloud or tag-cloud.
3.1.1 Text Collection
1381125803910Capturing the text is the first step of the process and aims at generating the data. These databases can be static or dynamic. Static database remains the same throughout the process while dynamic database can be updated at every instant of time 18,19.
Figure 3.1: The Text Mining Process
Pre-processing starts the text preparation into a more structured representation.
1) Tokenization: Tokenization is used to identify all words in a given text.
2) Data Filtering: People use a lot of casual language on twitter. For example, ‘happy’ is used in the form of ‘haaaaaaappy’. Though this implies the same word ‘happy’, the classifiers consider these as two different words. To improve this and make words more similar to generic words, such sets of repeated letters are replaced by two occurrences. Thus haaaaappy would be replaced by happy.
3) Stop Word Removal: Is used to eliminate that words that occurs frequently such as article, prepositions, conjunction and adverbs. These stop words depends on language of the text in questions. For example, words like the, and, before, while, and so on do not contribute to the sentiment.
4) Stemming: In information retrieval, stemming is the process of reducing a word to its root form. For example, walking, walker walked all these words are derived from the root word walk. Hence, the stemmed form of all the above words is walk 20.
The analysis step is usually considered the core of text mining, because this is when some type of useful, nontrivial knowledge is extracted from the text.
In order to validate the analysis are performed. It is necessary to employ quantitative measures and qualitative measures. After such a validation it may be necessary to return to one or more of the previous step so as to perform modification and try alternatives 21,22.
LEXICON BASED APPROACH
We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL23 is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the texts opinion towards its main subject matter. We show that SO-CALs performance is consistent across domains and on completely unseen data. Additionally, we describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability24, 25.
Semantic orientation (SO) is a measure of subjectivity and opinion in text. It usually captures an evaluative factor (positive or negative) and potency or strength (degree to which the word, phrase, sentence, or document in question is positive or negative) towards a subject topic, person, or idea (Osgood, Suci, and Tannenbaum 1957). When used in the analysis of public opinion, such as the automated interpretation of on-line product reviews, semantic orientation can be extremely helpful in marketing, measures of popularity and success, and compiling reviews. The analysis and automatic extraction of semantic orientation can be found under different umbrella terms: sentiment analysis (Pang and Lee 2008), subjectivity (Lyons 1981; Langacker 1985), opinion mining (Pang and Lee 2008), analysis of stance (Biber and Finegan 1988; Conrad and Biber 2000), appraisal (Martin and White 2005), point of view (Wiebe 1994; Scheibman 2002), evidentiality (Chafe and Nichols 1986), and a few others, without expanding into neighboring disciplines and the study of emotion26 (Ketal 1975; Ortony, Clore, and Collins 1988) and affect (Batson, Shaw, and Oleson 1992).
In this article, sentiment analysis refers to the general method to extract subjectivity and polarity from text (potentially also speech), and semantic orientation refers to the polarity and strength of words, phrases, or texts. Our concern is primarily with the semantic orientation of texts, but we extract the sentiment of words and phrases towards that goal27, 28. There exist two main approaches to the problem of extracting sentiment automatically1 The lexicon-based approach involves calculating orientation for a document from the semantic orientation of words or phrases in the document (Turney 2002).
The text classification approach involves building classifiers from labeled instances of texts or sentences (Pang, Lee, and Vaithyanathan 2002), essentially a supervised classification task. The latter approach could also be described as a statistical or machinelearning approach. We follow the first method, in which we use dictionaries of words annotated with the words semantic orientation, or polarity. Dictionaries for lexicon-based approaches can be created manually, as we describe in this article (see also Stone et al. 1966; Tong 2001), or automatically, using seed words to expand the list of words (Hatzivassiloglou and McKeown 1997; Turney 2002; Turney and Littman 2003). Much of the lexicon-based research has focused on using adjectives as indicators of the semantic orientation of text (Hatzivassiloglou and McKeown 1997; Wiebe 2000; Hu and Liu 2004; Taboada, Anthony, and Voll 2006).2 First, a list of adjectives and corresponding SO values is compiled into a dictionary. Then, for any given text, all adjectives are extracted and annotated with their SO value, using the dictionary scores. The SO scores are in turn aggregated into a single score for the text.
The majority of the statistical text classification research builds Support Vector Machine classifiers, trained on a particular data set using features such as unigrams or bigrams, and with or without part-of-speech labels, although the most success-ful features seem to be basic unigrams (Pang, Lee, and Vaithyanathan 2002; Salvetti, Reichenbach, and Lewis 2006) Classifiers built using supervised methods reach quite a high accuracy in detecting the polarity of a text (Chaovalit and Zhou 2005; Kennedy and Inkpen 2006; Boiy et al. 2007; Bartlett and Albright 2008). However, although such classifiers perform very well in the domain that they are trained on, their per- formance drops precipitously (almost to chance) when the same classifier is used in a different domain (Aue and Gamon 2005; see also the discussion about domain specificity in Pang and Lee 2008, section 4.4). Consider, for example, an experiment using the Polarity Dataset, a corpus containing 2,000 movie reviews, in which Brooke (2009) extracted the 100 most positive and negative unigram features from an SVM classifier that reached 85.1% accuracy. Many of these features were quite predictable:
worst, waste, unfortunately, and mess are among the most negative, whereas memorable, wonderful, laughs, and enjoyed are all highly positive. Other features are domain specific and somewhat inexplicable: If the writer, director, plot, or scripts are mentioned, the review is likely to be disfavorable towards the movie, whereas the mention of performances, the ending, or even flaws, indicates a good movie. Closed-class function words appear frequently; for instance, as, yet, with, and both are all extremely positive, whereas since, have, though, and those have negative weight. Names also figure prominently, a problem noted by other researchers (Finn and Kushmerick 2003; Kennedy and Inkpen 2006). Perhaps most telling is the inclusion of unigrams like 2, video, tv, and series in the list of negative words. The polarity of these words actually makes some sense in context: Sequels and movies adapted from video games or TV series do tend to be less well-received than the average movie. However, these real-world facts are not the sort of knowledge a sentiment classifier ought to be learning; within the domain of movie reviews such facts are prejudicial, and in other domains (e.g., video games or TV shows) they are either irrelevant or a source of noise30, 31.
Another area where the lexicon-based model might be preferable to a classifier model is in simulating the effect of linguistic context. On reading any document, it becomes apparent that aspects of the local context of a word need to be taken into account in SO assessment, such as negation (e.g., not good) and intensification (e.g., very good), aspects that Polanyi and Zaenen (2006) named contextual valence shifters. Research by Kennedy and Inkpen (2006) concentrated on implementing those insights. They dealt with negation and intensification by creating separate features, namely, the appearance of good might be either good (no modification) not good (negated good), int good (intensified good), or dim good (diminished good). The classifier, however, cannot determine that these four types of good are in any way related, and so in order to train accurately there must be enough examples of all four in the training corpus. Moreover, we show in Section 2.4 that expanding the scope to two-word phrases does not deal with negation adequately, as it is often a long-distance phenomenon. Recent work has begun to address this issue. For instance, Choi and Cardie (2008) present a classifier that treats negation from a compositional point of view by first calculating polarity of terms independently, and then applying inference rules to arrive at a combined polarity score. As we shall see in Section 2, our lexicon-based model handles negation and intensification in a way that generalizes to all words that have a semantic orientation 32 value. A middle ground exists, however, with semi-supervised approaches to the problem. Read and Carroll (2009), for instance, use semi-supervised methods to build domain- independent polarity classifiers. Read and Carroll built different classifiers and show that they are more robust across domains. Their classifiers are, in effect, dictionary-based, differing only in the methodology used to build the dictionary. Li et al. (2010) use co-training to incorporate labeled and unlabeled examples, also making use of a distinction between sentences with a first person subject and with other subjects. Other hybrid methods include those of Andreevskaia and Bergler (2008), Dang, Zhang, and Chen (2010), Dasgupta and Ng (2009), Goldberg and Zhu (2006), or Prabowo and Thelwall (2009). Wan (2009) uses co-training in a method that uses English labeled data and an English classifier to learn a classifier for Chinese.
In our approach, we seek methods that operate at a deep level of analysis, incorporating semantic orientation of individual words and contextual valence shifters, yet do not aim at a full linguistic analysis (one that involves analysis of word senses or argument structure), although further work in that direction is possible.
In this article, starting in Section 2, we describe the Semantic Orientation CALculator (SO-CAL) that we have developed over the last few years. We first extract sentiment-bearing words (including adjectives, verbs, nouns, and adverbs), and use them to calculate semantic orientation, taking into account valence shifters (intensifiers, down toners, negation, and irrealis markers). We show that this lexicon-based method performs well, and that it is robust across domains and texts. One of the criticisms raised against lexicon-based methods is that the dictionaries are unreliable, as they are either built automatically or hand-ranked by humans (Andreevskaia and Bergler 2008). In Section 3, we present the results of several experiments that show that our dictionaries are robust and reliable, both against other existing dictionaries, and as compared to values assigned by humans (through the use of the Mechanical Turk interface) 33.
4.2 SO-CAL, the Semantic Orientation CALculator
Following Osgood, Suci, and Tannenbaum (1957), the calculation of sentiment in SO-CAL begins with two assumptions: that individual words have what is referred to as prior polarity, that is, a semantic orientation that is independent of context; and that said semantic orientation can be expressed as a numerical value. Several lexicon-based approaches have adopted these assumptions (Bruce and Wiebe 2000; Hu and Liu 2004; Kim and Hovy 2004). In this section, we describe the different dictionaries used in SO-CAL, and the incorporation of valence shifters. We conclude the section with tests that show SO-CALs performance on different data sets.
Much of the early research in sentiment focused on adjectives or adjective phrases as the primary source of subjective content in a document (Hatzivassiloglou and McKeown 1997; Hu and Liu 2004; Taboada, Anthony, and Voll 2006), albeit with some exceptions, especially more recently, which have also included the use of adverbs (Benamara et al. 2007); adjectives and verbs (Kim and Hovy 2004); adjective phrases (Whitelaw, Garg, and Argamon 2005); two-word phrases (Turney 2002; Turney and Littman 2003); adjectives, verbs, and adverbs (Subrahmanian and Reforgiato 2008); the exclusive use of verbs (Sokolova and Lapalme 2008); the use of non-affective adjectives and adverbs (Sokolova and Lapalme 2009a, 2009b); or rationales, words and phrases selected by human annotators (Zaidan and Eisner 2008). In general, the SO of an entire document is the combined effect of the adjectives or relevant words found within, based upon a dictionary of word rankings (scores)34. The dictionary can be created in different ways: manually, using existing dictionaries such as the General Inquirer (Stone et al. 1966), or semiautomatically, making use of resources like WordNet (Hu and Liu 2004; Kim and Hovy 2004; Esuli and Sebastiani 2006). The dictionary may also be produced automatically via association, where the score for each new adjective is calculated using the frequency of the proximity of that adjective with respect to one or more seed words. Seed words are a small set of words with strong negative or positive associations, such as excellent or abysmal. In principle, a positive adjective should occur more frequently alongside the positive seed words, and thus will obtain a positive score, whereas negative adjectives will occur most often in the vicinity of negative seed words, thus obtaining a negative score. The association is usually calculated following Turneys method for computing mutual information (Turney 2002; Turney and Littman 2003), but see also Rao and Ravichandran (2009) and Velikovich et al. (2010) for other methods using seed words.
Previous versions of SO-CAL (Taboada and Grieve 2004; Taboada, Anthony, and Voll 2006) relied on an adjective dictionary to predict the overall SO of a document, using a simple aggregate-and-average method: The individual scores for each adjective in a document are added together and then divided by the total number of adjectives in that document.4 As we describe subsequently, the current version of SO-CAL takes other parts of speech into account, and makes use of more sophisticated methods to determine the true contribution of each word.
To build the system and run our experiments, we use the corpus described in Taboada and Grieve (2004) and Taboada, Anthony, and Voll (2006), which consists of a 400-text collection of Epinions reviews extracted from eight different categories: books, cars, computers, cookware, hotels, movies, music, and phones, a corpus we named Epinions 1. Within each collection, the reviews were split into 25 positive and 25 negative reviews, for a total of 50 in each category, and a grand total of 400 reviews in the corpus (279,761 words). We determined whether a review was positive or negative through the recommended or not recommended feature provided by the reviews author.
4.2.2 Nouns, Verbs, and Adverbs
In the following example, adapted from Polanyi and Zaenen (2006), we see that lexical items other than adjectives can carry important semantic polarity information.
(1) The young man strolled + purposefully + through his neighborhood + .
(2) The teenaged male strutted cockily through his turf .
Although the sentences have comparable literal meanings, the plus-marked nouns, verbs, and adverbs in Example (1) indicate the positive orientation of the speaker towards the situation, whereas the minus-marked words in Example (2) have the opposite effect. It is the combination of these words in each of the sentences that conveys the semantic orientation for the entire sentence.
In order to make use of this additional information, we created separate noun, verb, and adverb dictionaries, hand-ranked using the same + 5 to 5 scale as our adjective dictionary. The enhanced dictionaries contain 2,252 adjective entries, 1,142 nouns, 903 verbs, and 745 adverbs.6 The SO-carrying words in these dictionaries were taken from a variety of sources, the three largest being Epinions 1, the 400-text corpus described in the previous section; a 100-text subset of the 2,000 movie reviews in the Polarity Dataset (Pang, Lee, and Vaithyanathan 2002; Pang and Lee 2004, 2005);7 and positive and negative words from the General Inquirer dictionary (Stone et al. 1966; Stone 1997).8 The sources provide a fairly good range in terms of register: The Epinions and movie reviews represent informal language, with words such as ass-kicking and nifty; at the other end of the spectrum, the General Inquirer was clearly built from much more formal texts, and contributed words such as adroit and jubilant, which may be more useful in the processing of literary reviews (Taboada, Gillies, and McFetridge 2006; Taboada et al. 2008) or other more formal texts35.
Each of the open-class words was assigned a hand-ranked SO value between 5 and 5 (neutral or zero-value words were excluded) by a native English speaker. The dictionaries were later reviewed by a committee of three other researchers in order to minimize the subjectivity of ranking SO by hand. Examples are shown in Table 1.
Word SO Value
hate (noun and verb) -4
delay (noun and verb) -1
relish (verb) 4
Table 4.1: Examples of words in the noun and verb dictionaries.
One difficulty with nouns and verbs is that they often have both neutral and non-neutral connotations. In the case of inspire (or determination), there is a very positive meaning (Example (3)) as well as a rather neutral meaning (Example (4)).
(3) The teacher inspired her students to pursue their dreams.
(4) This movie was inspired by true events.
Except when one sense was very uncommon, the value chosen reflected an averaging across possible interpretations. In some cases, the verb and related noun have a different SO value. For instance, exaggerate is 1, whereas exaggeration is 2, and the same values are applied to complicate and complication, respectively. We find that grammatical metaphor (Halliday 1985), that is, the use of a noun to refer to an action, adds a more negative connotation to negative words36.
All nouns and verbs encountered in the text are lemmatized and the form (singular or plural, past tense or present tense) is not taken into account in the calculation of SO value. As with the adjectives, there are more negative nouns and verbs than positive ones.
Word SO Value
Table 4.2: Examples from the adverb dictionary.
The adverb dictionary was built automatically using our adjective dictionary, by matching adverbs ending in -ly to their potentially corresponding adjective, except for a small selection of words that were added or modified by hand. When SO-CAL encountered a word tagged as an adverb that was not already in its dictionary, it would stem the word and try to match it to an adjective in the main dictionary. This worked quite well for most adverbs37, resulting in semantic orientation values that seem appropriate (see examples in Table 2).
Quirk et al. (1985) classify intensifiers into two major categories, depending on their polarity: Amplifiers (e.g., very) increase the semantic intensity of a neighboring lexical item, whereas down toners (e.g., slightly) decrease it. Some researchers in sentiment analysis (Kennedy and Inkpen 2006; Polanyi and Zaenen 2006) have implemented intensifiers using simple addition and subtraction that is, if a positive adjective has an SO value of 2, an amplified adjective would have an SO value of 3, and a down toned adjective an SO value of 1. One problem with this kind of approach is that it does not account for the wide range of intensifiers within the same subcategory. Extraordinarily, for instance, is a much stronger amplifier than rather. Another concern is that the amplification of already loud items should involve a greater overall increase in intensity when compared to more subdued counterparts (compare truly fantastic with truly okay); in short, intensification should also depend on the item being intensified.11 In SO-CAL, intensification is modeled using modifiers, with each intensifying word having a percentage associated with it; amplifiers are positive, whereas downtoners are negative, as shown in Table 3. For example, if sleazy has an SO value of 3, somewhat sleazy would have an SO value of: 3 (100% 30%) = 2 . 1. If excellent has a SO value of 5, most excellent would have an SO value of: 5 (100% + 100%) = 10. Intensifiers are applied recursively start- ing from the closest to the SO-valued word: If good has an SO value of 3, then really very good has an SO value of (3 100% + 25%) (100% + 15%) = 4 . 3. Besides adverbs and adjectives, other intensifiers are quantifiers (a great deal of ). We also included three other kinds of intensification that are common within our genre: the use of all capital letters, the use of exclamation marks, and the use of discourse connective but to indicate more salient information (e.g., …but the movie was GREAT!).12 In all, our intensifier dictionary contains 177 entries, some of them multi-word expressions.
Word SO Value
the (most) +100
Table 4.3: Percentages for some intensifiers.
4.2.4 Text-Level Features
Lexicon-based sentiment classifiers generally show a positive bias (Kennedy and Inkpen 2006), likely the result of a universal human tendency to favor positive language (Boucher and Osgood 1969).13 In order to overcome this problem, Voll and Taboada (2007) implemented normalization, shifting the numerical cut-off point between positive and negative reviews. In the current version of SO-CAL, we have used a somewhat different approach, instead supposing that negative expressions, being relatively rare, are given more cognitive weight when they do appear. Thus we increase the final SO of any negative expression (after other modifiers have applied) by a fixed amount (currently 50%). This seems to have essentially the same effect in our experiments, and is more theoretically satisfying.
Pang, Lee, and Vaithyanathan (2002) found that their machine-learning classifier performed better when a binary feature was used indicating the presence of a unigram in the text, instead of a numerical feature indicating the number of appearances. Counting each word only once does not seem to work equally well for word-counting models. We have, however, improved overall performance by decreasing the weight of words that appear more often in the text: The nth appearance of a word in the text will have only 1 / n of its full SO value.
Consider the following invented example
Overall, the film was excellent. The acting was excellent, the plot was excellent, and the direction was just plain excellent.
4.2.5 Other Features of SO-CAL