Typesetting text in math mode (2)

In a previous post (Typesetting text in math mode) I advertised the use of \mbox to write text in mathematical formulas. This works when you are in the "standard size", but looks funny if you have subscripts because the sizes are off:

$ 50 \mbox{ apples}_{\mbox{yellow}} \times 
100 \mbox{ apples}_{\mbox{red-green}} 
= \mbox{lots of apples}^{\mbox{to eat}} $

looks like
50 \mbox{ apples}_{\mbox{yellow}} \times 100 \mbox{ apples}_{\mbox{red-green}} = \mbox{lots of apples}^{\mbox{to eat}}

In these cases (and also in the standard cases but there it looks the same), you can use the command \text which will come out in the right font size. In addition to just \text, there is also \textbf (bold face), \textit (italics) and \texttt (typewriter).

$ 50 \text{ apples}_{\text{yellow}} \times 
100 \textit{ apples}_{\texttt{red-green}} 
= \textbf{lots of apples}^\text{to eat} $

looks like
50 \text{ apples}_{\text{yellow}} \times 100 \textit{ apples}_{\texttt{red-green}} = \textbf{lots of apples}^\text{to eat}

Note: Most of the time \text should just work in math mode without any packages, but for some distributions you need to explicitly load the package amstext or amsmath.

Typesetting text in math mode

In information retrieval and text classification, tf-idf plays a big role. Read the Wikipedia article to learn what it is about, here I want to deal with the problem of typesetting the formula in LaTeX.

The formula is log-weighted term frequency tf times inverse document frequency idf, if we naivly write this down, we arrive at this:

tf-idf_{t,d} = (1 +\log tf_{t,d}) \cdot \log \frac{N}{df_t}

When you look at the LaTeX output, you will see that several things go wrong. In math mode, LaTeX interprets two letters next to each other as a product of two variables. So the name tf becomes the mathematical expression “t times f” and is typeset accordingly. Also, in case of tf-idf, the name contains a hyphen. In math mode a hyphen between two expression is interpreted as a minus sign. So this is definitely not what we want.

How do we solve the problem? What we want is that this part is interpreted as normal text. One possibility to add text to equations is the command \mbox{} (another is the command \text{} which requires the amsmath package). So this is it:

\mbox{tf-idf}_{t,d} = (1 +\log \mbox{tf}_{t,d}) \cdot \log \frac{N}{\mbox{df}_t}