Histograms of category frequencies in R

I am learning R, so this is my first attempt to create histograms in R. The data that I have is a vector of one category for each data point. For this example we will use a vector of a random sample of letters. The important thing is that we want a histogram of the frequencies of texts, not numbers. And the texts are longer than just one letter. So let’s start with this:

labels <- sample(letters[1:20],100,replace=TRUE)
labels <- vapply(seq_along(labels), 
                 function(x) paste(rep(labels[x],10), collapse = ""),
                 character(1L)) # Repeat each letter 10 times
library(plyr) # for the function 'count'
distribution <- count(labels)
distribution_sorted <- 
   distribution[order(distribution[,"freq"], decreasing=TRUE),]

I use the function count from the package plyr to get a matrix distribution with the different categories in column one (called "x") and the number of times this label occurs in column two (called "freq"). As I would like the histogram to display the categories from the most frequent to the least frequent one, I then sort this matrix by frequency with the function order. The function gives back a vector of indices in the correct order, so I need to plug this into the original matrix as row numbers.

Now let's do the histogram:

mp <- barplot(distribution_sorted[,"freq"],
         names.arg=distribution_sorted[,1], # X-axis names
         las=2,  # turn labels by 90 degrees
         col=c("blue"), # blue bars (just for fun)
         xlab="Kategorie", ylab="Häufigkeit", # Axis labels
         )

There are many more settings to adapt, e.g., you can use cex to increase the font size for the numerical y-axis values (cex.axis), the categorical x-axis names (cex.names), and axis labels (cex.lab).

In my plot there is one problem. My categorie names are much longer than the values on the y-axis and so the axis labels are positioned incorrectly. This is the point to give up and do the plot in Excel (ahem, LaTeX!) - or take input from fellow bloggers. They explain the issues way better than me, so I will just post my final solution. I took the x-axis label out of the plot and inserted it separately with mtext. I then wanted a line for the x-axis as well and in the end I took out the x-axis names from the plot again and put them into a separate axis at the bottom (side=1) with zero-length ticks (tcl=0) intersecting the y-axis at pos=-0.3.

# mai = space around the plot: bottom - left - top - right
# mgp = spacing for axis title - axis labels - axis line
par(mai=c(2.5,1,0.3,0.15), mgp=c(2.5,0.75,0))
mp <- barplot(distribution_sorted[,"freq"],
         #names.arg=distribution_sorted[,1], # X-axis names/labels
         las=2,  # turn labels by 90 degrees
         col=c("blue"), # blue bars (just for fun)
         ylab="Häufigkeit", # Axis title
         )
axis(side=1, at=mp, pos=-0.3, 
     tick=TRUE, tcl=0, 
     labels=distribution_sorted[,1], las=2, 
     )
mtext("Kategorie", side=1, line=8.5) # x-axis label

There has to be an easier way !?