combine text files column-wise

You just need the column command, and tell it to use tabs to separate columns

paste file1 file2 | column -s $'\t' -t

To address the "empty cell" controversy, we just need the -n option to column:

$ paste <(echo foo; echo; echo barbarbar) <(seq 3) | column -s $'\t' -t
foo        1
2
barbarbar  3

$ paste <(echo foo; echo; echo barbarbar) <(seq 3) | column -s $'\t' -tn
foo        1
           2
barbarbar  3

My column man page indicates -n is a "Debian GNU/Linux extension." My Fedora system does not exhibit the empty cell problem: it appears to be derived from BSD and the man page says "Version 2.23 changed the -s option to be non-greedy"

You're looking for the handy dandy pr command:

paste file1 file2 | pr -t -e24

The "-e24" is "expand tab stops to 24 spaces". Luckily, paste puts a tab-character between columns, so pr can expand it. I chose 24 by counting the characters in "Recursively enumerable" and adding 2.

Update: Here ia a much simpler script (that the one at the end of the question) for tabulated output. Just pass filename to it as you would to paste... It uses html to make the frame, so it is tweakable. It does preserve multiple spaces, and the column alignment is preserved when it encounters unicode characters. However, the way the editor or viewer renderers the unicode is another matter entirely...

┌──────────────────────┬────────────────┬──────────┬────────────────────────────┐
│ Languages            │ Minimal        │ Chomsky  │ Unrestricted               │
├──────────────────────┼────────────────┼──────────┼────────────────────────────┤
│ Recursive            │ Turing machine │ Finite   │     space indented         │
├──────────────────────┼────────────────┼──────────┼────────────────────────────┤
│ Regular              │ Grammars       │          │ ➀ unicode may render oddly │
├──────────────────────┼────────────────┼──────────┼────────────────────────────┤
│ 1 2  3   4    spaces │                │ Symbol-& │ but the column count is ok │
├──────────────────────┼────────────────┼──────────┼────────────────────────────┤
│                      │                │          │ Context                    │
└──────────────────────┴────────────────┴──────────┴────────────────────────────┘

#!/bin/bash
{ echo -e "<html>\n<table border=1 cellpadding=0 cellspacing=0>"
  paste "$@" |sed -re 's#(.*)#\x09\1\x09#' -e 's#\x09# </pre></td>\n<td><pre> #g' -e 's#^ </pre></td>#<tr>#' -e 's#\n<td><pre> $#\n</tr>#'
  echo -e "</table>\n</html>"
} |w3m -dump -T 'text/html'

---

A synopsis of the tools presented in the answers (so far).
I've had a pretty close look at them; here is what I've found:

paste # This tool is common to all the answers presented so far # It can handle multiple files; therefore multiple columns... Good! # It delimits each column with a Tab... Good. # Its output is not tabulated.

All the tools below all remove this delimiter!... Bad if you need a delimiter.

column # It removes the Tab delimiter, so field identificaton is purely by columns which it seems to handle quite well.. I haven't spotted anything awry... # Aside from not having a unique delimiter, it works fine!

expand # Only has a single tab setting, so it is unpredictable beyond 2 columns # The alignment of columns is not accurate when handling unicode, and it removes the Tab delimiter, so field identificaton is purely by column alignment

pr # Only has a single tab setting, so it is unpredictable beyond 2 columns. # The alignment of columns is not accurate when handling unicode, and it removes the Tab delimiter, so field identificaton is purely by column alignment

To me, column it the obvious best soluton as a one-liner.. It you want either the delimiter, or an ASCII-art tabluation of your files, read on, otherwise.. columns is pretty darn good :)...

Here is a script which takes any numper of files and creates an ASCII-art tabulated presentation.. (Bear in mind that unicode may not render to the expected width, eg. ௵ which is a single character. This is quite different to the column numbers being wrong, as is the case in some of the utilities mentioned above.) ... The script's output, shown below, is from 4 input files, named F1 F2 F3 F4...

+------------------------+-------------------+-------------------+--------------+
| Languages              | Minimal automaton | Chomsky hierarchy | Grammars     |
| Recursively enumerable | Turing machine    | Type-0            | Unrestricted |
| Regular                | Finite            | —                 |              |
| Alphabet               |                   | Symbol            |              |
|                        |                   |                   | Context      |
+------------------------+-------------------+-------------------+--------------+

#!/bin/bash

# Note: The next line is for testing purposes only!
set F1 F2 F3 F4 # Simulate commandline filename args $1 $2 etc...

p=' '                                # The pad character
# Get line and column stats
cc=${#@}; lmax=                      # Count of columns (== input files)
for c in $(seq 1 $cc) ;do            # Filenames from the commandline 
  F[$c]="${!c}"        
  wc=($(wc -l -L <${F[$c]}))         # File length and width of longest line 
  l[$c]=${wc[0]}                     # File length  (per file)
  L[$c]=${wc[1]}                     # Longest line (per file) 
  ((lmax<${l[$c]})) && lmax=${l[$c]} # Length of longest file
done
# Determine line-count deficits  of shorter files
for c in $(seq 1 $cc) ;do  
  ((${l[$c]}<lmax)) && D[$c]=$((lmax-${l[$c]})) || D[$c]=0 
done
# Build '\n' strings to cater for short-file deficits
for c in $(seq 1 $cc) ;do
  for n in $(seq 1 ${D[$c]}) ;do
    N[$c]=${N[$c]}$'\n'
  done
done
# Build the command to suit the number of input files
source=$(mktemp)
>"$source" echo 'paste \'
for c in $(seq 1 $cc) ;do
    ((${L[$c]}==0)) && e="x" || e=":a -e \"s/^.{0,$((${L[$c]}-1))}$/&$p/;ta\""
    >>"$source" echo '<(sed -re '"$e"' <(cat "${F['$c']}"; echo -n "${N['$c']}")) \'
done
# include the ASCII-art Table framework
>>"$source" echo ' | sed  -e "s/.*/| & |/" -e "s/\t/ | /g" \'   # Add vertical frame lines
>>"$source" echo ' | sed -re "1 {h;s/[^|]/-/g;s/\|/+/g;p;g}" \' # Add top and botom frame lines 
>>"$source" echo '        -e "$ {p;s/[^|]/-/g;s/\|/+/g}"'
>>"$source" echo  
# Run the code
source "$source"
rm     "$source"
exit

Here is my original answer (trimmed a bit in lieu of the above script)

Using wc to get the column width, and sed to right pad with a visible character . (just for this example)... and then paste to join the two columns with a Tab char...

paste <(sed -re :a -e 's/^.{1,'"$(($(wc -L <F1)-1))"'}$/&./;ta' F1) F2

# output (No trailing whitespace)
Languages.............  Minimal automaton
Recursively enumerable  Turing machine
Regular...............  Finite

If you want to pad out the right column:

paste <( sed -re :a -e 's/^.{1,'"$(($(wc -L <F1)-1))"'}$/&./;ta' F1 ) \
      <( sed -re :a -e 's/^.{1,'"$(($(wc -L <F2)-1))"'}$/&./;ta' F2 )  

# output (With trailing whitespace)
Languages.............  Minimal automaton
Recursively enumerable  Turing machine...
Regular...............  Finite...........

combine text files column-wise

---

Tags:

Text Processing

Table

Related

Recent Posts