How does ` ... | awk '$1=$1'` remove extra spaces?

When we assign a value to a field variable ie. value of $1 is assigned to field $1, awk actually rebuilds its $0 by concatenating them with default field delimiter(or OFS) space.

we can get the same case in the following scenarios as well...

echo -e "foo foo\tbar\t\tbar" | awk '$1=$1'
foo foo bar bar

echo -e "foo foo\tbar\t\tbar" | awk -v OFS=',' '$1=$1'
foo,foo,bar,bar

echo -e "foo foo\tbar\t\tbar" | awk '$3=1'
foo foo 1 bar

For GNU AWK this behavior is documented here:
https://www.gnu.org/software/gawk/manual/html_node/Changing-Fields.html

$1 = $1 # force record to be reconstituted


echo "$string" | awk '$1=$1'

causes AWK to evaluate $1=$1, which assigns the field to itself, and has the side-effect of re-evaluating $0; then AWK considers the value of the expression, and because it’s non-zero and non-empty, it executes the default action, which is to print $0.

The extra spaces are removed when AWK re-evaluates $0: it does so by concatenating all the fields using OFS as a separator, and that’s a single space by default. When AWK parses a record, $0 contains the whole record, as-is, and $1 to $NF contain the fields, without the separators; when any field is assigned to, $0 is reconstructed from the field values.

Whether AWK outputs anything in this example is dependent on the input:

echo "0      0" | awk '$1=$1'

won’t output anything.

Tags:

Awk