\times 4/3 { a8( b c) }

And as of Lilypond 2.17:

\tuplet 4/3 { a8( b c) }

\times 4/3 { a8( b c) }

And as of Lilypond 2.17:

\tuplet 4/3 { a8( b c) }

In choir scores, you often have the score for two voices (e.g., soprano and alto) in one line:

\new Staff = "Frauen"<< \new Voice = "Sopran" { \voiceOne \global \soprano } \new Voice = "Alt" { \voiceTwo \global \alto } >>

When they both have a pause at the same time with the same length, lilypond will still print two rests in different positions. If you (like me) think this looks weird, here is how you can change it:

soprano = \relative c' { a2 \oneVoice r4 \voiceOne } a4 } alto = \relative c' { a2 s4 } a4 }

In one voice, change to only one voice with `\oneVoice`

for the rest and then back to the usual voice, here `/voiceOne`

. If you do the same in the other voice, you will get warnings about clashing notes, so instead of using a rest, use an invisible rest (spacer) with `s`

.

An alternative is the following command which causes all rests to appear in the middle of the line. It should be used inside the `\layout`

block:

\override Voice.Rest #'staff-position = #0

In the last post we discussed accuracy, a straightforward method of calculating the performance of a classification system. Using accuracy is fine when the classes are of equal size, but this is often not the case in real world tasks. In such cases the very large number of true negatives outweighs the number of true positives in the evaluation so that accuracy will always be artificially high.

Luckily there are performance measures that ignore the number of true negatives. Two frequently used measures are precision and recall. **Precision P** indicates how many of the items that we have identified as positives are really positives. In other words, how precise have we been in our identification. How many of those that we think are X, really are X. Formally, this means that we divide the number of true positives by the number of all identified positives (true and false):

**Recall R** indicates how many of the real positives we have found. So from all of the positive items that are there, how many did we manage to identify. In other words, how exhaustive we were. Formally, this means that we divide the number of true positives by the number of all existing positives (true positives and false negatives):

For our example from the last post, precision and recall are as follows:

It is easy to get a recall of 100%. We just say for everything that it is a positive. But as this will probably not the case (or else we have a really easy dataset to classify!), this approach will give us a really low precision. On the other hand, we can usually get a high precision if we only classify as positive one single item that we are really, really sure about. But if we do that, recall will be low, as there will be more than one item in the dataset to be classified (or else it is not a very meaningful set).

So recall and precision are in a sort of balance. The **F1 score** or **F1 measure** is a way of putting the two of them together to produce one single number. Formally it calculates the harmonic mean of the two numbers and weights the two of them with the same importance (there are other variants that put more importance on one of them):

Using the values for precision and recall for our example, F1 is:

Intuitively, F1 is between the two values of precision and recall, but closer to the lower of the two. In other words, it penalizes if we concentrate only on one of the values and rewards systems where precision and recall are closer together.

Link for a second explanation: Explanation from an Information Retrieval perspective

We are still trying to figure out how good our system for determining whether e-mails are spam or not is. In the last post we ended up with a *confusion matrix* like this:

Actual label | |||

Spam | NonSpam | ||

Predicted label | Spam | 1 (true positives, TP) | 3 (false positives, FP) |

NonSpam | 2 (false negatives, FN) | 4 (true negatives, TN) |

Now we want to calculate numbers from this table to describe the performance of our system. One easy way of doing this is to use **accuracy A**. Accuracy basically describes which percentage of decisions we got right. So we would take the diagonal entries in the matrix (the true positives and true negatives) and divide by the total number of entries. Formally:

In our example the accuracy is:

Using accuracy is fine in examples like the above when both classes occur more or less with the same frequency. But frequently the number of true negatives is larger than the number of true positives by many orders of magnitudes. So let’s assume 994 for true negatives and when we calculate accuracy again, we get this:

It doesn’t really matter if we correctly identify any spam mails. Even if we always say NonSpam, so we get zero Spam-Mails right, we still get more nearly the same accuracy as above. So accuracy is not a good indicator of performance for our system in this situation. In the next post we will look at other measures we can use instead.

Link for a second explanation: Explanation from an Information Retrieval perspective

Let’s say we want to analyze e-mails to determine whether they are spam or not. We have a set of mails and for each of them we have a label that says either "Spam" or "NotSpam" (for example we could get these labels from users who mark mails as spam). On this set of documents (the *training data*) we can train a machine learning system which given an e-mail can predict the label. So now we want to know how the system that we have trained is performing, whether it really recognizes spam or not.

So how can we find out? We take another set of mails that have been marked as "Spam" or "NotSpam" (the *test data*), apply our machine learning system and get predicted labels for these documents. So we end up with a list like this:

Actual label | Predicted label | |
---|---|---|

Mail 1 | Spam | NonSpam |

Mail 2 | NonSpam | NonSpam |

Mail 3 | NonSpam | NonSpam |

Mail 4 | Spam | Spam |

Mail 5 | NonSpam | NonSpam |

Mail 6 | NonSpam | NonSpam |

Mail 7 | Spam | NonSpam |

Mail 8 | NonSpam | Spam |

Mail 9 | NonSpam | Spam |

Mail 10 | NonSpam | Spam |

We can now compare the predicted labels from our system to the actual labels to find out how many of them we got right. When we have two classes, there are four possible outcomes for the comparison of a predicted label and an actual label. We could have predicted "Spam" and the actual label is also "Spam". Or we predicted "NonSpam" and the label is actually "NonSpam". In both of these cases we were right, so these are the *true* predictions. But, we could also have predicted "Spam" when the actual label is "NonSpam". Or "NonSpam" when we should have predicted "Spam". So these are the *false* predictions, the cases where we have been wrong. Let’s assume that we are interested in how well we can predict "Spam". Every mail for which we have predicted the class "Spam" is a *positive* prediction, a prediction *for* the class we are interested in. Every mail where we have predicted "NonSpam" is a *negative* prediction, a prediction of *not* the class we are interested in. So we can summarize the possible outcomes and their names in this table:

Actual label | |||

Spam | NonSpam | ||

Predicted label | Spam | true positives (TP) | false positives (FP) |

NonSpam | false negatives (FN) | true negatives (TN) |

The *true positives* are the mails where we have predicted "Spam", the class we are interested in, so it is a *positive* prediction, and the actual label was also "Spam", so the prediction was *true*. The *false positives* are the mails where we have predicted "Spam" (a *positive* prediction), but the actual label is "NonSpam", so the prediction is *false*. Correspondingly the *false negatives*, the mails we should have labeled as "Spam" but didn’t. And the *true negatives* that we correctly recognized as "NonSpam". This matrix is called a *confusion matrix*.

Let’s create the confusion matrix for the table with the ten mails that we classified above. Mail 1 is "Spam", but we predicted "NonSpam", so this is a false negative. Mail 2 is "NonSpam" and we predicted "NonSpam", so this is a true negative. And so on. We end up with this table:

Actual label | |||

Spam | NonSpam | ||

Predicted label | Spam | 1 | 3 |

NonSpam | 2 | 4 |

In the next post we will take a loo at how we can calculate performance measures from this table.

Link for a second explanation: Explanation from an Information Retrieval perspective

I am learning R, so this is my first attempt to create histograms in R. The data that I have is a vector of one category for each data point. For this example we will use a vector of a random sample of letters. The important thing is that we want a histogram of the frequencies of texts, not numbers. And the texts are longer than just one letter. So let’s start with this:

labels <- sample(letters[1:20],100,replace=TRUE) labels <- vapply(seq_along(labels), function(x) paste(rep(labels[x],10), collapse = ""), character(1L)) # Repeat each letter 10 times library(plyr) # for the function 'count' distribution <- count(labels) distribution_sorted <- distribution[order(distribution[,"freq"], decreasing=TRUE),]

I use the function `count`

from the package `plyr`

to get a matrix `distribution`

with the different categories in column one (called "x") and the number of times this label occurs in column two (called "freq"). As I would like the histogram to display the categories from the most frequent to the least frequent one, I then sort this matrix by frequency with the function `order`

. The function gives back a vector of indices in the correct order, so I need to plug this into the original matrix as row numbers.

Now let's do the histogram:

mp <- barplot(distribution_sorted[,"freq"], names.arg=distribution_sorted[,1], # X-axis names las=2, # turn labels by 90 degrees col=c("blue"), # blue bars (just for fun) xlab="Kategorie", ylab="Häufigkeit", # Axis labels )

There are many more settings to adapt, e.g., you can use `cex`

to increase the font size for the numerical y-axis values (`cex.axis`

), the categorical x-axis names (`cex.names`

), and axis labels (`cex.lab`

).

In my plot there is one problem. My categorie names are much longer than the values on the y-axis and so the axis labels are positioned incorrectly. This is the point to give up and do the plot in Excel (ahem, LaTeX!) - or take input from fellow bloggers. They explain the issues way better than me, so I will just post my final solution. I took the x-axis label out of the plot and inserted it separately with `mtext`

. I then wanted a line for the x-axis as well and in the end I took out the x-axis names from the plot again and put them into a separate `axis`

at the bottom (`side=1`

) with zero-length ticks (`tcl=0`

) intersecting the y-axis at `pos=-0.3`

.

# mai = space around the plot: bottom - left - top - right # mgp = spacing for axis title - axis labels - axis line par(mai=c(2.5,1,0.3,0.15), mgp=c(2.5,0.75,0)) mp <- barplot(distribution_sorted[,"freq"], #names.arg=distribution_sorted[,1], # X-axis names/labels las=2, # turn labels by 90 degrees col=c("blue"), # blue bars (just for fun) ylab="Häufigkeit", # Axis title ) axis(side=1, at=mp, pos=-0.3, tick=TRUE, tcl=0, labels=distribution_sorted[,1], las=2, ) mtext("Kategorie", side=1, line=8.5) # x-axis label

There has to be an easier way !?

Thank you Google Scholar Alerts for bringing to my attention this latest reference to one of my papers:

(3) 基于语义角色标注的提取

语义角色标注 SRL 是将词语序列分组, 并按照语

义角色对其分类。SRL 的目的就是找出给定句子中谓

语词的对应语义成分, 即核心语义角色(主语、宾语等)

和附属角色(时间、地点等)。SRL 只针对句子中的部

分成分与谓语的关系进行标注, 属于浅层语义分析。

Kessler 等 [37] 运用 SRL 对英文比较句的元素进行标注

与提取, 效果优于之前的方法。但是, 只使用 SRL 对

中文比较关系提取效果较差, 为此进行不同程度的改

进。例如, 构建混合比较模式的 SRL 模型, 对汉语比

较句进行两阶段标注 [9] ; 将 SRL 与句法分析树相结合,

提出语义角色分析树 [28] , 通过计算两棵子树之间的匹

配相似度抽取比较关系; 还有学者尝试将 CRF 应用到

SRL 中 [38] 。上述研究取得了一定成果, 但是采用 SRL

进行中文标注的效果还有待提高, 对涉及上下句的比

较信息提取尚未能够有效解决。

Whatever it says, it counts towards my H-index!

From the category "bookmarks/links": A successful Git branching model

And still on the topic of LaTeX presentations, this time trying to plot a symbol over a bar to indicate significance.

This is how it works:

\node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord1,0.47) {$\bullet$};

You need to put this code directly after the point where the data series has been plotted. Example:

\begin{tikzpicture} \begin{axis}[xtick=data,axis x line*=bottom,axis y line=left,symbolic x coords={Xcoord1, Xcoord2}] \addplot [ybar,seagreen] coordinates {(Xcoord1, -0.027) (Xcoord2, 0.436)}; \node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord2,0.47) {$\bullet$}; \addlegendentry{System 1} \addplot+ [ybar,blue] coordinates {(Xcoord1, 0.331) (Xcoord2, 0.095)}; \node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord1,0.36) {$\bullet$}; \addlegendentry{System 2} \addplot+ [ybar,orange] coordinates {(Xcoord1, 0.222) (Xcoord2, 0.441)}; \node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord1,0.25) {$\bullet$}; \node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord2,0.47) {$\bullet$}; \addlegendentry{System 3} \end{axis} \end{tikzpicture}

A while back I posted about using overlays for bar charts to show one value at a time. For my latest presentation I had a similar but slightly different wish: show all values for one system at a time, one system after the other.

Easily done, I just adapt the code from my previous post to show all values at the same time:

\newcommand{\addplotoverlay}[3][]{ \alt<#3->{ \addplot+ [ybar,#1] coordinates {#2}; }{ \addplot+ [ybar,#1] coordinates {(Xcoord1,0)}; % + don't show zero values in plot } }

This is specific to my plot, `Xcoord1`

is one of my symbolic x-coordinates in the plot. Other than that, the code is completely independent from the used coordinates and the number of them, which makes it more flexible than my old stuff.

Usage (this will let seagreen bars at the given coordinates appear on slide 2):

\addplotoverlayrank[seagreen]{(Xcoord1, 0.331) (Xcoord2, 0.095)}{2}