Accuracy

We are still trying to figure out how good our system for determining whether e-mails are spam or not is. In the last post we ended up with a confusion matrix like this:

Actual label
Spam NonSpam
Predicted label Spam 1 (true positives, TP) 3 (false positives, FP)
NonSpam 2 (false negatives, FN) 4 (true negatives, TN)

Now we want to calculate numbers from this table to describe the performance of our system. One easy way of doing this is to use accuracy A. Accuracy basically describes which percentage of decisions we got right. So we would take the diagonal entries in the matrix (the true positives and true negatives) and divide by the total number of entries. Formally:
A = (TP+TN)/(TP+TN+FP+FN)

In our example the accuracy is:
A = (1+4)/(1+4+2+3) = 5/10 = 0.5

Using accuracy is fine in examples like the above when both classes occur more or less with the same frequency. But frequently the number of true negatives is larger than the number of true positives by many orders of magnitudes. So let’s assume 994 for true negatives and when we calculate accuracy again, we get this:
A = (1+994)/(1+994+2+3) = 995/1000 = 0.995

It doesn’t really matter if we correctly identify any spam mails. Even if we always say NonSpam, so we get zero Spam-Mails right, we still get more nearly the same accuracy as above. So accuracy is not a good indicator of performance for our system in this situation. In the next post we will look at other measures we can use instead.

Link for a second explanation: Explanation from an Information Retrieval perspective

Settings swk

Ubuntu / Gnome settings:

  • System settings / Appearance / Behavior: check “Enable workspaces”, show the menus “in the window’s title bar”, menu visibility “always displayed”.
  • System settings / Regional format: Change to “English (Ireland)”.
  • System settings / Bluetooth: Turn off.
  • System settings / Details / Removable media: set all to “Ask what to do”.
  • System settings / Time & Date / Clock : check “Weekday”, “date and month”, “24-hour time”, “include week numbers”
  • System settings / Display: turn off “Sticky edges”, check “Launcher on all displays”
  • System settings / Text entry: set to “Allow different sources for each window” and “new windows use the default source”.
  • Unity tweak tool / Hotcorners: turn on, upper left corner set “Window spread”

Suse, Kubuntu / KDE settings:

  • Settings / Desktop Behaviour / Desktop effects – deactivate “Fade”, “Blur”, “Translucency”,
  • Settings / Desktop Behaviour / Accessibility – deactivate “use system bell” in “audible bell”
  • Settings / Account Details / KDE Wallet – deactivate
  • Settings / Input devices / Keyboard – configure English keyboard
  • Settings / Input devices / Mouse / General – set “double click to open files”
  • Settings / Task Manager Settings / General – Sorting “manually”, Grouping “do not group”, mark “show only tasks from the current desktop”
  • Settings / Startup and Shutdown / Desktop session – On startup “start with an empty session”
  • Panel – Remove “Show Desktop” widget, add “Quick launcher” widget.

Firefox settings:

  • General: check “Make Firefox your default browser”, “Always ask me where to save files”, “Open new windows in a new tab instead”.
  • Search: uncheck “Provide search suggestions”.
  • Applications: change pdf to “Always ask”.
  • Privacy: “Use custom settings for history”, uncheck “Remember search and form history”, Keep cookies “I close Firefox”.
  • Security: uncheck “Remember logins for sites”.
  • Advanced / General: check “Search for text when I start typing”,
    uncheck “Check my spelling as I type”.
  • In about:config: set “browser.bookmarks.showRecentlyBookmarked” to False

Thunderbird settings:

  • Enable menu bar
  • Preferences / General: uncheck “When Thunderbird launches show start page”, uncheck “play a sound when new message arrives”.
  • Preferences / Display / Advanced: check “Close message window/tab on move or delete”, uncheck “Show only display name for people in my address book”.
  • Preferences / Composition / Spelling: uncheck “Enable spell check as you type”.
  • Preferences / Privacy: Uncheck “Accept cookies from sites”, check “Tell sites that I do not want to be tracked”.
  • View / Layout: uncheck “Message pane”
  • View / Today pane: uncheck “Show”
  • Account settings / Copies and Folders: change “Place a copy in”, check “Place replies in the folder of message”.
  • Account settings / Composition: uncheck “Compose messages in HTML format.”
  • Install Enigmail and import keys.
  • Install Lightning and import calendars.

Pidgin settings:

  • Preferences / Interface: set “Hide new IM conversations” to “Never”. Set “New conversations” to “New window”. Show system tray icon “Always”
  • Preferences / Conversations: uncheck “show formatting”, uncheck “buddy animation”, uncheck “highlight misspelled words”, uncheck “resize smileys”.
  • Preferences / Sounds: check “Mute sounds”
  • Preferences / Status: set “Idle time” to “Never”, uncheck “change to this status”, set “startup status” to “available”.
  • Plugins: Enable “Message Notification”, “Message Timestamp Formats”,
  • Show: “Offline Buddies”, “Empty groups”
  • Install Skype plugin

Atom settings:

  • Core settings: uncheck “audio beep”, Restore previous windows on start set “no”,
  • Editor: check “Scroll past end”, check “Soft wrap at preferred line length”,
  • Themes: Set to “Atom light”
  • Install Packages:
    • atom-latex (custom toolchain %TEX %ARG %DOC, add *.synctex.gz for cleaning, save files before build)
    • script
    • minimap
    • linter-flake8
  • Disable packages: autocomplete-plus

Konsole/Terminal settings

  • TabBar: check “Show New Tab and Close Tab buttons”
  • Profile / Scrolling: “Unlimited Scrollback”

Confusion matrix

Let’s say we want to analyze e-mails to determine whether they are spam or not. We have a set of mails and for each of them we have a label that says either "Spam" or "NotSpam" (for example we could get these labels from users who mark mails as spam). On this set of documents (the training data) we can train a machine learning system which given an e-mail can predict the label. So now we want to know how the system that we have trained is performing, whether it really recognizes spam or not.

So how can we find out? We take another set of mails that have been marked as "Spam" or "NotSpam" (the test data), apply our machine learning system and get predicted labels for these documents. So we end up with a list like this:

Actual label Predicted label
Mail 1 Spam NonSpam
Mail 2 NonSpam NonSpam
Mail 3 NonSpam NonSpam
Mail 4 Spam Spam
Mail 5 NonSpam NonSpam
Mail 6 NonSpam NonSpam
Mail 7 Spam NonSpam
Mail 8 NonSpam Spam
Mail 9 NonSpam Spam
Mail 10 NonSpam Spam

We can now compare the predicted labels from our system to the actual labels to find out how many of them we got right. When we have two classes, there are four possible outcomes for the comparison of a predicted label and an actual label. We could have predicted "Spam" and the actual label is also "Spam". Or we predicted "NonSpam" and the label is actually "NonSpam". In both of these cases we were right, so these are the true predictions. But, we could also have predicted "Spam" when the actual label is "NonSpam". Or "NonSpam" when we should have predicted "Spam". So these are the false predictions, the cases where we have been wrong. Let’s assume that we are interested in how well we can predict "Spam". Every mail for which we have predicted the class "Spam" is a positive prediction, a prediction for the class we are interested in. Every mail where we have predicted "NonSpam" is a negative prediction, a prediction of not the class we are interested in. So we can summarize the possible outcomes and their names in this table:

Actual label
Spam NonSpam
Predicted label Spam true positives (TP) false positives (FP)
NonSpam false negatives (FN) true negatives (TN)

The true positives are the mails where we have predicted "Spam", the class we are interested in, so it is a positive prediction, and the actual label was also "Spam", so the prediction was true. The false positives are the mails where we have predicted "Spam" (a positive prediction), but the actual label is "NonSpam", so the prediction is false. Correspondingly the false negatives, the mails we should have labeled as "Spam" but didn’t. And the true negatives that we correctly recognized as "NonSpam". This matrix is called a confusion matrix.

Let’s create the confusion matrix for the table with the ten mails that we classified above. Mail 1 is "Spam", but we predicted "NonSpam", so this is a false negative. Mail 2 is "NonSpam" and we predicted "NonSpam", so this is a true negative. And so on. We end up with this table:

Actual label
Spam NonSpam
Predicted label Spam 1 3
NonSpam 2 4

In the next post we will take a loo at how we can calculate performance measures from this table.

Link for a second explanation: Explanation from an Information Retrieval perspective