Replace newlines with sed

Sed is a commandline linux tool to replace text in a file or input stream. Typically sed works line-oriented, i.e., a line is read, the expression applied, then the next line is read. Say we have a file where one line is one word. We want to reconstruct the sentence. How to replace all linebreaks in the file with a space? Simple:

sed "{:q;N;s/\n/ /g;t q}" 

The regular expression ‘s/\n/ /’ says substitute linebreaks (\n) by a space. ‘g’ says apply this globally. ‘N’ says append the next line to what is processed. Using only ‘N’ would replace linebreaks in every second line. The rest of the thing is a trick to join all lines together. We define the label q (‘:q;’), then we say that in case that there was a sucessfull substitution, go to label q (‘t q’).

Now we have all words in one line. Across sentences! Sentences are separed by an empty line. So easy – replace linebreaks by spaces, replace two adjacent spaces by a linebreak. Gives you one sentence per line, words separated by spaces. Voila:

cat  | sed "{:q;N;s/\n/ /g;t q}" | sed "{s/  /\n/g}"
This entry was posted in Linux and tagged , , , by swk. Bookmark the permalink.

About swk

I am a computational linguist, teacher of computer science and above all a huge fan of LaTeX. I use LaTeX for everything, including things you never wanted to do with LaTeX. My latest love is lilypond, aka LaTeX for music. I'll post at irregular intervals about cool stuff, stupid hacks and annoying settings I want to remember for the future.