Make awk produce error on non-numeric

A reasonable way to test would be to compare the field using tests similar to strtod, which is the method that awk uses to convert strings to numbers:

$2 !~ / *[+-]?[[:digit:]]/ { print "NAN: " $2; exit 1; }

The above differs from strtod in that it does not consider INFINITY or NAN to be "numbers". The leading space requirement could be relaxed under awk's default field-splitting behavior -- meaning the fields would never contain leading space:

$2 !~ /[+-]?[[:digit:]]/ { print "NAN: " $2; exit 1; }

A further refinement, thanks to Stéphane's comment and answer here:

$2 !~ /^[+-]?([[:digit:]]*\.?[[:digit:]]*([eE][-+]?[[:digit:]]+)?|0[xX][[:xdigit:]]*\.?[[:xdigit:]]*([pP][-+]?[[:digit:]]+)?)$/ { print "NAN: " $2; exit 1; }

Broken out for slightly better legibility, that regex is:

/^[+-]?([[:digit:]]*\.?[[:digit:]]*([eE][-+]?[[:digit:]]+)?|\
        0[xX][[:xdigit:]]*\.?[[:xdigit:]]*([pP][-+]?[[:digit:]]+)?)$/

... where the intention is to allow a possible leading + or -, then either a floating point number or hexadecimal number. The floating point number has optional leading digits, an option separator (here fixed to be a period .), followed by some number of digits, optionally followed by an exponent. The hex number must start with 0x or 0X, followed by hex digits, a separator, more hex digits, and optionally followed by a "power" (exponent). The entire second field must match one of those formats (as anchored by ^ and $). Omitted here, for the purposes of this question, are the NAN and INFINITY options.

Another option would be to force a numeric conversion, then compare it to zero and then further compare the original input to something that would convert to zero; more specifically, does it start with an optional + or -, then is it followed by zeros, or followed by a period and zeros:

{ number=0 + $2;
  if (!number && $2 !~ /^[+-]?(0+)|\.0+/)
    print "NAN: "$2;
}

I ended up with this:

awk -v col=$col '
typeof($col) != "strnum" {
    print "Error on line " NR ": " $col " is not numeric"
    noprint=1
    exit 1
}
{
    sum+=$col
}
END {
    if(!noprint)
        print sum
}' $file

This uses typeof, which is a GNU awk extension. typeof($col) returns 'strnum' if $col is a valid number, and 'string' or 'unassigned' if it is not.

See Can I determine type of an awk variable?


awk -v col=2 '
    $col+0==0 && $col!~/^[+-]?0/ { print "bad number " $col > "/dev/stderr" } 
    {sum+=$col}
    END{print sum}' input-file

It's up to you to complicate it if you want it to also handle .0 or .0e+33 as valid representations of 0; notice that awk will ignore trailing junk when converting strings to numbers ("1.4e1e3"+0, "1.4e1.e7"+0 or "14+13"+0 will be all equal to 14).