How to process a multi column text file to get another multi column text file?

Put each field on a line and post-columnate.

Each field on one line

tr

tr -s ' ' '\n' < infile

grep

grep -o '[[:alnum:]]*' infile

sed

sed 's/\s\+/\n/g' infile

or more portable:

sed 's/\s\+/\
/g' infile

awk

awk '$1=$1' OFS='\n' infile

or

awk -v OFS='\n' '$1=$1' infile

Columnate

paste

For 2 columns:

... | paste - -

For 3 columns:

... | paste - - -

etc.

sed

For 2 columns:

... | sed 'N; s/\n/\t/g'

For 3 columns:

... | sed 'N; N; s/\n/\t/g'

etc.

xargs

... | xargs -n number-of-desired-columns

As xargs uses /bin/echo to print, beware that data that looks like options to echo will be interpreted as such.

awk

... | awk '{ printf "%s", $0 (NR%n==0?ORS:OFS) }' n=number-of-desired-columns OFS='\t'

pr

... | pr -at -number-of-desired-columns

or

... | pr -at -s$'\t' -number-of-desired-columns

columns (from the autogen package)

... | columns -c number-of-desired-columns

Typical output:

a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj

As Wildcard pointed out, this will only work if your file is nicely formatted, in that there aren't any special characters that the shell will interpret as globs and you are happy with the default word splitting rules. If there's any question about whether your files will "pass" that test, do not use this approach.

One possibility would be to use printf to do it like

printf '%s\t%s\n' $(cat your_file)

That will do word splitting on the contents of your_file and will pair them and print them with tabs in between. You could use more %s format strings in the printf to have extra columns.


$ sed -E 's/\s+/\n/g' ip.txt | paste - -
a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

$ sed -E 's/\s+/\n/g' ip.txt | paste - - -
a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj