Check if a field is an integer in awk

Use

mongostat | awk -F ' *' '$19 ~ /^[0-9]+$/ { print "Number of connections: " $19 }'

$19 ~ /^[0-9]+$/ checks if $19 matches the regex ^[0-9]+$ (i.e., if it only consists of digits), and the associated action is only executed if this is the case.

By the way, come to think of it, the special field separator is probably unnecessary. The default field separator of awk is any sequence of whitespaces, so unless mongostat uses an odd mix of tabs and spaces,

mongostat | awk '$19 ~ /^[0-9]+$/ { print "Number of connections: " $19 }'

should work fine.


Check if this field is formed by just digits by making it match the regex ^[0-9]+$:

$19~/^[0-9]+$/

^ stands for beginning of string and $ for end, so we are checking if it consist in digits from the beginning until the end. With + we make it match at least one digit, otherwise an empty field would also match (so a file with less fields would always match).

All together:

mongostat | awk 'BEGIN{FS=" *"} $19~/^[0-9]+$/ {print "Number of connections: "$19}'

You have to be very careful here. The answer is not as simple as you imagine:

  • an integer has a sign, so you need to take this into account in your tests. So the integers -123 and +123 will not be recognised as integers in earlier proposed tests.
  • awk flexibly converts variables types from floats (numbers) to strings and vice versa. Converting to strings is done using sprintf. If the float represents an integer, use the format %d otherwise use the format CONVFMT (default %.6g). Some more detailed explanations are at the bottom of this post. So checking if a number is an integer or if a string is an integer are two different things.

So when you make use of a regular expression to test if a number is an integer, it will work flawlessly if your variable is still considered to be a string (such as an unprocessed field). However, if your variable is a number, awk will first convert the number in a string before doing the regular expression test and as such, this can fail:

is_integer(x) { x ~ /^[-+]?[0-9]+$/ }
BEGIN { n=split("+0 -123 +123.0 1.0000001",a)
        for(i=1;i<=n;++i) print a[i],is_integer(a[i]), is_integer(a[i]+0), a[i]+0
}

which outputs:

+0          1          1        0
-123        1          1        -123
+123.0      0          1        123        << QUESTIONABLE
1.0000001   0          1        1          << FAIL
            ^          ^
          test        test
        as string   as number

As you see, the last case failed because "%.6g" converts 1.0000001 into the string 1 and this is done because we use string operations.

A more generic solution to validate if a variable represents an integer would be the following:

function is_number(x)   { return x+0 == x }
function is_string(x)   { return ! is_number(x) }
function is_float(x)    { return x+0 == x && int(x) != x } 
function is_integer(x)  { return x+0 == x && int(x) == x } 
BEGIN { n=split( "0 +0 -0 123 +123 -123 0.0 +0.0 -0.0 123.0 +123.0 -123.0  1.23 1.0000001 -1.23E01 123ABD STRING",a)
    for(i=1;i<=n;++i) {
        print a[i], is_number(a[i]), is_float(a[i]), is_integer(a[i]), \
              a[i]+0, is_number(a[i]+0), is_float(a[i]+0), is_integer(a[i]+0)
    }
}

This method still has issues with recognising 123.0 as a float, but that is because awk only knows floating point numbers.


A numeric value that is exactly equal to the value of an integer (see Concepts Derived from the ISO C Standard) shall be converted to a string by the equivalent of a call to the sprintf function (see String Functions) with the string "%d" as the fmt argument and the numeric value being converted as the first and only expr argument. Any other numeric value shall be converted to a string by the equivalent of a call to the sprintf function with the value of the variable CONVFMT as the fmt argument and the numeric value being converted as the first and only expr argument. The result of the conversion is unspecified if the value of CONVFMT is not a floating-point format specification. This volume of POSIX.1-2017 specifies no explicit conversions between numbers and strings. An application can force an expression to be treated as a number by adding zero to it, or can force it to be treated as a string by concatenating the null string ( "" ) to it.

source: Awk Posix standard

Tags:

Awk