Fun with newlines

Use a typewriter lately? No? Well, who cares… except when you encounter stupidities left over from the early days of computing where people were still used to typewriters. Because typewriters had two ways of going to a new line, ASCII knows two ways of representing the newline:

  • LF (line feed, German Zeilenvorschub), represented as Unicode code point 0x0A, ASCII 00001100 and escape character \n
  • CR (carriage return, German Wagenr├╝cklauf), represented as Unicode code point 0x0D, ASCII 00001101 and escape character \r

ASCII was the first-ever invented encoding for representing text in bits. It’s from the 1960s and at the time someone probably thought it is a good idea to have two characters for the concept of a new line. We’d think "who cares about stuff from the 1960s", it’s 2017, right? But unfortunately many later encodings base themselves on ASCII, most notably those from the Unicode family, e.g., the widely used UTF-8. So – thank you, 1960s! /sarcasm

Two characters for a new line would not be too bad if they were used consistently, but that is where the fun begins. Of course they are not! Differnt operating systems use different conventions to mark the end of a line:

  • Linux and Mac OSX use LF
  • Windows uses CR LF
  • (and to make the chaos complete, Mac OS from before version X uses CR)

So have fun reading "plain text" files! /sarcasm