Why does awk behave differently for $1 if the value is 0 (number zero)?

From the The GNU Awk User’s Guide:

An assignment is an expression, so it has a value—the same value that is assigned. Thus, ‘z = 1’ is an expression with the value one.

So

  • echo 0 | awk '$1=$1' the pattern evaluates to 0 (FALSE)

  • echo 1 | awk '$1=$1' the pattern evaluates to 1 (TRUE) and the default action print is executed


I don't think it is a matter of the numerical value: the standard conversions take care of that (here, at least).

The OP shows four different awk codes, all variations on: pattern { action }

(a) $1 = $1

That reassigns $1 to itself. It is not a boolean test, it is a no-op (effectively), and it returns the value of $1. If $1 is a 0, the pattern is false and the default print action is skipped completely. If $1 is non-zero, the input is printed.

(b) { $1 = $1; print; }

That reassigns $1 to itself, also a no-op. In the absence of a pattern, the action is performed and the input is always printed.

(c) $1 == $1

That is a boolean expression that is always true. 0 is 0 and 1 is 1 (and aardvark is aardvark). In the absence of an action, the input is always printed.

(d) { $1 == $1; print; }

There is no pattern. The comparison evaluates to a true boolean which is discarded. The input is always printed.


The existing answers fail to explain why

echo 0 | awk '$0="0"'
echo 0 | awk '$0=substr($0,1)'
echo 0 | awk '$0=$0""'

will all print 0, but

echo 0 | awk '$0'
echo 000 | awk '$0'

won't print anything, though in all the cases, the pattern expression evaluates to 0.

How come 0 is true in one case and false in the other?

That's because the "field variables" (the result of the $ operator) are treated as a special case, and (if possible) are automatically converted to numeric strings, which, if numerically equal to 0, will be considered false when used in a boolean context:

A string value shall be considered a numeric string if it comes from one of the following:

  1. Field variables

  2. Input from the getline() function

  3. FILENAME

  4. ARGV array elements

  5. ENVIRON array elements

  6. Array elements created by the split() function

  7. A command line variable assignment

  8. Variable assignment from another numeric string variable

and [if it looks like a number, read the whole description here]

Please also read the RATIONALE for the reasons why the concept of numeric strings and this special-casing was needed, especially the bit about a comparison like echo 0 000 | awk '$1==$2' being true, but not echo 0 | awk '$1=="000"'.


As another quirk, notice that, at least in some implementations, $0 (the current input record) loses its magical "numeric string" property if an assignment to a subfield causes it to be recomputed:

$ echo 0 | gawk '{$1=0} $0'
0

This does not seem to be covered by the standard, though it matches the behaviour of nawk/bwk the standard awk is based on (but not that of mawk).

Also, awk implementations are allowed to recognize NAN, INF and INFINITY in the input as the corresponding floating point numbers, though support for this is spotty and inconsistent. You may still be bitten by eg.

echo But his daughter named Nan | awk '$NF'

not printing anything in FreeBSD's awk (bwk, original-awk).

Tags:

Awk