Fixing the locale on Kubuntu

After installing my brand new Kubuntu, I got the following error:

locale: Cannot set LC_ALL to default locale: No such file or directory
perl: warning: Setting locale failed.

Here is how to fix it.

The first step is to see the current settings with locale. This should give an output similar to the following:

> locale
LANG=en_IE.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_DE.UTF-8"
LC_NUMERIC=en_DE.UTF-8
LC_TIME=en_DE.UTF-8
LC_COLLATE="en_DE.UTF-8"
LC_MONETARY=en_DE.UTF-8
LC_MESSAGES="en_DE.UTF-8"
LC_PAPER=en_DE.UTF-8
LC_NAME=en_DE.UTF-8
LC_ADDRESS=en_DE.UTF-8
LC_TELEPHONE=en_DE.UTF-8
LC_MEASUREMENT=en_DE.UTF-8
LC_IDENTIFICATION=en_DE.UTF-8
LC_ALL=

This output already might give you a hint about what is wrong. In my case, I have the entry en_DE which looks fishy. The next step is, to see what locales are installed on your machine. This is done with the parameter -a for locale:

> locale -a
C
C.UTF-8
en_GB.utf8
en_IE.utf8
en_US.utf8
en_ZA.utf8
POSIX

Here we already found the problem. The output does not contain the locale “en_DE”. In this case, it is because “en_DE” does not actually exist. I have no idea where it comes from. But somehow the combination of being in Germany and installing an English operating system caused it. So what I want to do is to set everything that has the wrong locale to the correct locale “de_DE” instead.

As the German locale is not installed on our system yet, we first have to create it. This is done with locale-gen:

> sudo locale-gen de_DE.utf8
Generating locales (this might take a while)...
  de_DE.UTF-8... done   
Generation complete.

Now we can set it as the default with the following:

sudo dpkg-reconfigure locales

You should restart the computer for the changes to take effect (just to be safe).

Find file names with invalid encoding on Linux

I have files copied from Windows computers in ancient times. The filenames contain special characters and they have been messed up somewhere along the way. For example I got a file named 9.5.2 Modelo de aceptaci??n (espa??ol).doc in the folder 9 Garant??a del Estado.

First, I want to find and list these files. Stackexchange tells us how to do that:

LC_ALL=C find . -name '*[! -~]*'

This will find all names that have non-ASCII letters, not only those that are broken. But in my case I have folders where ALL of the names are broken, so I don’t mind.

Second, I want to fix the names. I did it manually, but for future reference, if I ever were to do anything like that again, I might use one of the solutions proposed in this thread on serverfault.com.

Change the encoding of a file

My favourite topic is "encoding" (of course that was sarcasm). So my first post is about how to change the encoding of some text file from Latin-1 to UTF-8 on command line:

iconv -f latin1 -t utf8 source_file > target_file

Of course we need to know what encoding the file is in… which may be a topic for some future post.