# Histograms of category frequencies in R

I am learning R, so this is my first attempt to create histograms in R. The data that I have is a vector of one category for each data point. For this example we will use a vector of a random sample of letters. The important thing is that we want a histogram of the frequencies of texts, not numbers. And the texts are longer than just one letter. So let’s start with this:

labels <- sample(letters[1:20],100,replace=TRUE)
labels <- vapply(seq_along(labels),
function(x) paste(rep(labels[x],10), collapse = ""),
character(1L)) # Repeat each letter 10 times
library(plyr) # for the function 'count'
distribution <- count(labels)
distribution_sorted <-
distribution[order(distribution[,"freq"], decreasing=TRUE),]


I use the function count from the package plyr to get a matrix distribution with the different categories in column one (called "x") and the number of times this label occurs in column two (called "freq"). As I would like the histogram to display the categories from the most frequent to the least frequent one, I then sort this matrix by frequency with the function order. The function gives back a vector of indices in the correct order, so I need to plug this into the original matrix as row numbers.

Now let's do the histogram:

mp <- barplot(distribution_sorted[,"freq"],
names.arg=distribution_sorted[,1], # X-axis names
las=2,  # turn labels by 90 degrees
col=c("blue"), # blue bars (just for fun)
xlab="Kategorie", ylab="Häufigkeit", # Axis labels
)


There are many more settings to adapt, e.g., you can use cex to increase the font size for the numerical y-axis values (cex.axis), the categorical x-axis names (cex.names), and axis labels (cex.lab).

In my plot there is one problem. My categorie names are much longer than the values on the y-axis and so the axis labels are positioned incorrectly. This is the point to give up and do the plot in Excel (ahem, LaTeX!) - or take input from fellow bloggers. They explain the issues way better than me, so I will just post my final solution. I took the x-axis label out of the plot and inserted it separately with mtext. I then wanted a line for the x-axis as well and in the end I took out the x-axis names from the plot again and put them into a separate axis at the bottom (side=1) with zero-length ticks (tcl=0) intersecting the y-axis at pos=-0.3.

# mai = space around the plot: bottom - left - top - right
# mgp = spacing for axis title - axis labels - axis line
par(mai=c(2.5,1,0.3,0.15), mgp=c(2.5,0.75,0))
mp <- barplot(distribution_sorted[,"freq"],
#names.arg=distribution_sorted[,1], # X-axis names/labels
las=2,  # turn labels by 90 degrees
col=c("blue"), # blue bars (just for fun)
ylab="Häufigkeit", # Axis title
)
axis(side=1, at=mp, pos=-0.3,
tick=TRUE, tcl=0,
labels=distribution_sorted[,1], las=2,
)
mtext("Kategorie", side=1, line=8.5) # x-axis label


There has to be an easier way !?

# Citations

Thank you Google Scholar Alerts for bringing to my attention this latest reference to one of my papers:

(3) 基于语义角色标注的提取

Kessler 等 [37] 运用 SRL 对英文比较句的元素进行标注

SRL 中 [38] 。上述研究取得了一定成果, 但是采用 SRL

Whatever it says, it counts towards my H-index!

# Marking significance in a bar plot

And still on the topic of LaTeX presentations, this time trying to plot a symbol over a bar to indicate significance.

This is how it works:

\node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord1,0.47) {$\bullet$};


You need to put this code directly after the point where the data series has been plotted. Example:

\begin{tikzpicture}
\begin{axis}[xtick=data,axis x line*=bottom,axis y line=left,symbolic x coords={Xcoord1, Xcoord2}]

\addplot [ybar,seagreen] coordinates {(Xcoord1, -0.027) (Xcoord2, 0.436)};
\node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord2,0.47) {$\bullet$};

\addplot+ [ybar,blue] coordinates  {(Xcoord1, 0.331) (Xcoord2, 0.095)};
\node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord1,0.36) {$\bullet$};

\addplot+ [ybar,orange] coordinates {(Xcoord1, 0.222) (Xcoord2, 0.441)};
\node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord1,0.25) {$\bullet$};
\node[xshift=\pgfkeysvalueof{/pgf/bar shift},anchor=south] at (axis cs:Xcoord2,0.47) {$\bullet$};
\end{axis}
\end{tikzpicture}


# Overlays for bar charts (take 2)

A while back I posted about using overlays for bar charts to show one value at a time. For my latest presentation I had a similar but slightly different wish: show all values for one system at a time, one system after the other.

Easily done, I just adapt the code from my previous post to show all values at the same time:

\newcommand{\addplotoverlay}[3][]{
\alt<#3->{
}{
\addplot+ [ybar,#1] coordinates {(Xcoord1,0)}; % + don't show zero values in plot
}
}


This is specific to my plot, Xcoord1 is one of my symbolic x-coordinates in the plot. Other than that, the code is completely independent from the used coordinates and the number of them, which makes it more flexible than my old stuff.

Usage (this will let seagreen bars at the given coordinates appear on slide 2):

\addplotoverlayrank[seagreen]{(Xcoord1, 0.331) (Xcoord2, 0.095)}{2}


# LaTeX ‘correct’ and ‘wrong’ symbols with TikZ

A symbol for a checkmark to indicate something is correct:

\newcommand{\correct}{$\color{green}\tikz\fill[scale=0.4](0,.35) -- (.25,0) -- (1,.7) -- (.25,.15) -- cycle;$}


A symbol for a cross to indicate something is wrong:

\newcommand{\wrong}{$\mathbin{\tikz [x=1.4ex,y=1.4ex,line width=.2ex, red] \draw (0,0) -- (1,1) (0,1) -- (1,0);}$}%


You’ll need TikZ for this.

# LaTeX presentation background picture

In one slide of a presentation I wanted to have a background picture and overlay it with several text blocks one after the other to have the effect of the text “coming out of” the background. It is tricky to align things in LaTeX beamer, especially if you want to have them on top of each other, so this is my solution: Two minipages that cover the whole slide on top of each other.

A slide is more or less 7cm high (depending a bit on your template). There probably is a length defined for that, but I was too lazy to look for it so I took the actual value. The width of the slide is of course \textwidth. I use vertically centered alignment for the minipage, but that is up to you (see the post Set height of a minipage for the options you can give to minipage).

The way it now works is the following. Create one minipage of full width and height. Use this to display the background image. Then jump back the full height and create a second minipage of full width and height to display the text inside of that. This is the code for my slide:

\begin{minipage}[c][7cm][c]{\textwidth}
\centering
\includegraphics[width=0.8\linewidth]{img/Reviews}
\end{minipage}

\vspace{-7cm}
\begin{minipage}[c][7cm][c]{\textwidth}
\centering

\visible<2->{
\colorbox{white}{\fbox{\textcolor{blue}{I was impressed by the fast shutter speed of D3200.}\only<3->{\textcolor{darkgreen}{~(\emph{positive})}}}}
}

\vspace{1cm}
\visible<4->{
\colorbox{white}{\fbox{\textcolor{blue}{The autofocus was \textbf{not} so reliable.}\only<5->{\textcolor{red}{~(\emph{negative})}}}}
}
\end{minipage}