How to join lines not starting with specific pattern to the previous line in UNIX?

Please try the following:

awk 'BEGIN {accum_line = "";} /^These/{if(length(accum_line)){print accum_line; accum_line = "";}} {accum_line = accum_line " " $0;} END {if(length(accum_line)){print accum_line; }}' < data.txt

The code consists of three parts:

  1. The block marked by BEGIN is executed before anything else. It's useful for global initialization
  2. The block marked by END is executed when the regular processing finished. It is good for wrapping the things. Like printing the last collected data if this line has no These at the beginning (this case)
  3. The rest is the code performed for each line. First, the pattern is searched for and the relevant things are done. Second, data collection is done regardless of the string contents.

With sed:

sed ':a;N;/\nThese/!s/\n/ /;ta;P;D' infile

resulting in

These are leaves.
These are branches.
These are greenery which gives oxygen, provides control over temperature and maintains cleans the air.
These are tigers
These are bears and deer and squirrels and other animals.
These are something you want to kill Which will see you killed in the end.
These are things you must to think to save your tomorrow.

Here is how it works:

sed '
:a                   # Label to jump to
N                    # Append next line to pattern space
/\nThese/!s/\n/ /    # If the newline is NOT followed by "These", append
                     # the line by replacing the newline with a space
ta                   # If we changed something, jump to label
P                    # Print part until newline
D                    # Delete part until newline
' infile

The N;P;D is the idiomatic way of keeping multiple lines in the pattern space; the conditional branching part takes care of the situation where we append more than one line.

This works with GNU sed; for other seds like the one found in Mac OS, the oneliner has to be split up so branching and label are in separate commands, the newlines may have to be escaped, and we need an extra semicolon:

sed -e ':a' -e 'N;/'$'\n''These/!s/'$'\n''/ /;ta' -e 'P;D;' infile

This last command is untested; see this answer for differences between different seds and how to handle them.

Another alternative is to enter the newlines literally:

sed -e ':a' -e 'N;/\
These/!s/\
/ /;ta' -e 'P;D;' infile

But then, by definition, it's no longer a one-liner.


awk '$1==These{print row;row=$0}$1!=These{row=row " " $0}'

you can take it from there. blank lines, separators,
other unspecified behaviors (untested)