Levels of sentiment analysis

Sentiment analysis can be done on different levels of granularity. We could determine the sentiment of a complete review which would give us something similar to the star ratings. We can then go down to a more detailed level, e.g., sentences or to aspects.

Document-level analysis
Input: Text of a complete document
Output: Polarity (as a label positive/negative/neutral or rating on a scale)

This task can be done fairly reliable with automatic methods, as there is usually some redundancy, so if the method misses one clue, there are other clues that are sufficient to know what polarity is expressed. Document-level analysis makes a few assumptions that may not be true. First, it assumes that a document talks about a single target and that all opinion expressions refer to this target. This may be true in some cases, but especially in longer reviews people like to compare the product they are discussing to other similar products, they may describe the plot of a movie or book, they may give opinions about the delivery, tell stories about how they got the product as a gift, and so on. Second, the assumption is that one document expresses one opinion, but human authors may be undecided. Finally, it assumes that the complete review expresses the opinion of one author, but there may be parts where other people’s opinions are cited (for completeness or to refute them).

Sentence-level analysis
Input: One sentence
Output: Polarity + target

On sentence level, we can add the task of finding out what the sentence talks about (the target) to the task of determining the sentiment. While this level of analysis allows us in some cases to overcome the difficulties we talked about on document level, it still makes the same assumptions for each sentence. And all of them may be false even in a single sentence, there may be more than one target ("A is better than B"), more than one opinion ("I liked the UI, but the ring tones were horrible") or opinions of more than one person ("I liked the size, but my wife hated it").

Aspect-level analysis
Input: A document or sentence
Output: many tuples of (polarity, target, possibly holder)

Instead of using the linguistic units of a sentence or a document, we can use individual opinion expressions as the main unit of what we want to extract. A sentence "I liked the UI, but my wife thought the ring tones were horrible" would result in two tuples: (positive, UI, author) and (negative, ring tones, my wife). The different tuples can then be added to get one polarity per aspect, per holder or even an overall polarity

What is sentiment analysis?

People have always been interested in other opinions before taking a decision, traditionally by asking friends or reading surveys or professional reviews published in a magazine. Over the last years, huge amounts of opinions have become available on the web in the form of consumer reviews, blogs, forum posts or twitter and they are widely used in decision making:

(c) Randall Munroe, https://www.xkcd.com/1036/

Companies have caught up to this and opinions have become a topic of interest for many of them in recent years. Because of the huge amount of available opinions, it is impossible for a human to read them all, so an automatic method of analyzing them is necessary: Sentiment analysis is born (sometimes also called subjectivity analsyis or opinion mining). In its most basic form, sentiment analysis attempts to determine whether a givne text expresses an opinion and whether this opinion is positive or negative.

You might think that we could just use the star ratings that usually accompany reviews to perform the same task. If we only want to know the sentiment of a complete review, this would indeed give us more or less the same information. But reviews usually contain much more detailed information:

(c) Randall Munroe, https://www.xkcd.com/937/

While it is nice that the app has a good UI and runs smoothly, a user might want to assign more weight to the review that discusses the aspect of "warning about a tornado". After all, this is the main functionality of the app and probably the main reason of getting it. Everything else is an added bonus. Besides identifying reviews that discuss the important aspects of an item, real reviews usually also contain opinions on a variety of aspects that may be evaluated quite differently by different users ("I liked the UI, but I hated the alarm tones"). So a more detailed analysis is necessary, and this is what makes sentiment analysis interesting!

Amazon Review Downloader

If you do sentiment analysis on document level, there are huge amounts of data annotated with star-ratings available on Amazon and similar pages. In theory. In practice, to get this data, you need to crawl Amazon pages, download the reviews and parse the HTML to extract the individual reviews. And this would be the n-th time somebody wrote a script to do that. So, to save you the waste of time, Andrea Esuli kindly offers some scripts to download Amazon reviews and convert them to a csv file. Thank you! You can find it on Andrea Esuli’s web page.