In AWK, is it possible to specify "ranges" of fields?

I'm late but this is quick at to the point so I'll leave it here. In cases like this I normally just remove the fields I don't need with gsub and print. Quick and dirty example, since you know your file is delimited by tabs you can remove the first 31 fields:

awk '{gsub(/^(\w\t){31}/,"");print}'

example of removing 4 fields because lazy:

printf "a\tb\tc\td\te\tf\n" | awk '{gsub(/^(\w\t){4}/,"");print}'

Output:

e   f

This is shorter to write, easier to remember and uses less CPU cycles than horrendous loops.


Besides the awk answer by @Jerry, there are other alternatives:

Using cut (assumes tab delimiter by default):

cut -f32-58 foo >bar

Using perl:

perl -nle '@a=split;print join "\t", @a[31..57]' foo >bar

Mildly revised version:

BEGIN { s = 32; e = 57; }

      { for (i=s; i<=e; i++) printf("%s%s", $(i), i<e ? OFS : "\n"); }

You can do it in awk by using RE intervals. For example, to print fields 3-6 of the records in this file:

$ cat file
1 2 3 4 5 6 7 8 9
a b c d e f g h i

would be:

$ gawk 'BEGIN{f="([^ ]+ )"} {print gensub("("f"{2})("f"{4}).*","\\3","")}' file
3 4 5 6
c d e f

I'm creating an RE segment f to represent every field plus it's succeeding field separator (for convenience), then I'm using that in the gensub to delete 2 of those (i.e the first 2 fields), remember the next 4 for reference later using \3, and then delete what comes after them. For your tab-separated file where you want to print fields 32-57 (i.e. the 26 fields after the first 31) you'd use:

gawk 'BEGIN{f="([^\t]+\t)"} {print gensub("("f"{31})("f"{26}).*","\\3","")}' file

The above uses GNU awk for it's gensub() function. With other awks you'd use sub() or match() and substr().

EDIT: Here's how to write a function to do the job:

gawk '
function subflds(s,e,   f) {
   f="([^" FS "]+" FS ")"
   return gensub( "(" f "{" s-1 "})(" f "{" e-s+1 "}).*","\\3","")
}
{ print subflds(3,6) }
' file
3 4 5 6
c d e f

Just set FS as appropriate. Note that this will need a tweak for the default FS if your input file can start with spaces and/or have multiple spaces between fields and will only work if your FS is a single character.

Tags:

Awk