In information retrieval and text classification, tf-idf plays a big role. Read the Wikipedia article to learn what it is about, here I want to deal with the problem of typesetting the formula in LaTeX.

The formula is log-weighted term frequency *tf* times inverse document frequency *idf*, if we naivly write this down, we arrive at this:

tf-idf_{t,d} = (1 +\log tf_{t,d}) \cdot \log \frac{N}{df_t}

When you look at the LaTeX output, you will see that several things go wrong. In math mode, LaTeX interprets two letters next to each other as a product of two variables. So the name *tf* becomes the mathematical expression “t times f” and is typeset accordingly. Also, in case of *tf-idf*, the name contains a hyphen. In math mode a hyphen between two expression is interpreted as a minus sign. So this is definitely not what we want.

How do we solve the problem? What we want is that this part is interpreted as normal text. One possibility to add text to equations is the command `\mbox{}` (another is the command `\text{}` which requires the amsmath package). So this is it:

\mbox{tf-idf}_{t,d} = (1 +\log \mbox{tf}_{t,d}) \cdot \log \frac{N}{\mbox{df}_t}