Separator between statements in awk

Very good question! I think the key is this: "Thus, the program shown at the start of this section could also be written this way:"

Is not mandatory to be written in this way. It is a kind of alternative way. This means (and has been proved in action) that below statements are both correct :

$ awk '/12/ { print $0 } /21/ { print $0 }' file
$ awk '/12/ { print $0 } ; /21/ { print $0 }' file

I think this semicolon usage is to cover really short - idiomatic code , for example cases that we omit the action part and we want to apply multiple rules on the same line:

$ awk '/12//21/' file
awk: cmd. line:2: /12//21/
awk: cmd. line:2:         ^ unexpected newline or end of string

In this case using a semicolon is mandatory to separate rules (=conditions):

$ awk '/12/;/21/' file

Since the {action} part is ommited in both rules/both conditions, the default action will be performed for every rule = {print $0}


In gawk, this two quote from the manual describe the issue:

An action consists of one or more awk statements, enclosed in braces (‘{…}’). Each statement specifies one thing to do. The statements are separated by newlines or semicolons.

A semicolon is a "separator" but not a "terminator".
The only valid terminator of an action is a closing brace (}).

Therefore, what follows an action closing brace (}) must be some other pattern{action}

In the "man mawk" there is some other description that may help clarify what awk should do:

Statements are terminated by newlines, semi-colons or both. Groups of statements such as actions or loop bodies are blocked via { ... } as in C. The last statement in a block doesn't need a terminator.

The "man nawk" explains it like this:

The pattern comes first, and then the action. Action statements are enclosed in { and }.

And, if you want to dwell into the detail, read the POSIX description:

action           : '{' newline_opt                             '}'
                 | '{' newline_opt terminated_statement_list   '}'
                 | '{' newline_opt unterminated_statement_list '}'
                 ;

And search for what is an "unterminated" statement list.

Or, simpler, search for Action to read:

Any single statement can be replaced by a statement list enclosed in curly braces. The application shall ensure that statements in a statement list are separated by <newline> or <semicolon> characters.

Again: are separated by <newline> or <semicolon> characters


The semicolon between conditional blocks appears to be optional; only the semicolons between statements within blocks appear to be mandatory:

$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" } /bar/ {print "bar found"}'
foo found
bar found
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" }; /bar/ {print "bar found"}'
foo found
bar found
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found"; print "whee" }'
foo found
whee
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" print "whee" }'
gawk: cmd. line:1: /foo/ { print "foo found" print "whee" }
gawk: cmd. line:1:                           ^ syntax error

However, when the actual code block between two conditionals is omitted in favor of the default (i. e. {print}), the semicolon becomes necessary:

$ echo -e "foo\nbar" | gawk '/foo/ /bar/'
gawk: cmd. line:2: /foo/ /bar/
gawk: cmd. line:2:            ^ unexpected newline or end of string
$ echo -e "foo\nbar" | gawk '/foo/; /bar/'
foo
bar

Tags:

Awk

Gawk