How to print certain columns by name?

Maybe something like this:

$ cat t.awk
NR==1 {
    for (i=1; i<=NF; i++) {
        ix[$i] = i
    }
}
NR>1 {
    print $ix[c1], $ix[c2]
}
$ awk -f t.awk c1=id c2=name input 
1 ed
2 joe
$ awk -f t.awk c1=age c2=name input 
50 ed
70 joe

If you want to specify the columns to print on the command line, you could do something like this:

$ cat t.awk 
BEGIN {
    split(cols,out,",")
}
NR==1 {
    for (i=1; i<=NF; i++)
        ix[$i] = i
}
NR>1 {
    for(i=1; i <= length(out); i++)
        printf "%s%s", $ix[out[i]], OFS
    print ""
}
$ awk -f t.awk -v cols=name,age,id,name,id input 
ed 1 ed 50 1 
joe 2 joe 70 2 

(Note the -v switch to get the variable defined in the BEGIN block.)


csvkit

Convert the input data to a csv format and use a csv tool such as csvcut from the csvkit:

$ cat test-cols.dat 
id  name  age
1   ed    50
2   joe   70 

Install csvkit:

$ pip install csvkit

Use tr with its squeeze option -s to convert it into a valid csv file and apply csvcut:

$ cat test-cols.dat | tr -s ' ' ',' | csvcut -c id,age
id,age
1,50
2,70

If you want to return to the old data format, you can use tr ',' ' ' | column -t

$ cat test-cols.dat | tr -s ' ' ',' | csvcut -c id,age | tr ',' ' ' | column -t
id  age
1   50
2   70

Notes

  • csvkit supports also different delimiters (shared option -d or --delimiter), but returns a csv file:

    • If the file uses only spaces to separate columns (no tabs at all), following works

      $ csvcut -d ' ' -S -c 'id,age' test-cols.dat
      id,age
      1,50
      2,70
      
    • If the file uses a tab to separate columns, following works and csvformat can be used to get back tsv file:

      $ csvcut -t -c 'id,age' test-cols.dat | csvformat -T
      id  age
      1   50
      2   70
      

      As far as I have checked, only a single tab is allowed.

  • csvlook can format the table in a markdown table format

    $ csvcut -t -c "id,age" test-cols.dat | csvlook
    | id | age |
    | -- | --- |
    |  1 |  50 |
    |  2 |  70 |
    
  • UUOC (Useless Use Of Cat): I like it this way to construct the command.


Just trowing a Perl solution into the lot:

#!/usr/bin/perl -wnla

BEGIN {
    @f = ('id', 'age');   # field names to print
    print "@f";           # print field names
}

if ($. == 1) {            # if line number 1
    @n = @F;              #   get all field names
} else {                  # or else
    @v{@n} = @F;          #   map field names to values
    print "@v{@f}";       #   print values based on names
}

Tags:

Awk