awk repetition {n} is not working

EREs (extended regular expressions as used by awk or egrep) initially didn't have {x,y}. It was first introduced in BREs (as used by grep or sed), but with the \{x,y\} syntax that didn't break backward portability.

But when it was added to EREs with that {x,y} syntax, it did break backward portability as a foo{2} RE was matching something different before.

So some implementations chose not to do it. You'll find that /bin/awk, /bin/nawk and /bin/egrep on Solaris still don't honour it (you need to use /usr/xpg4/bin/awk or /usr/xpg4/bin/grep -E). Same for awk and nawk on FreeBSD (based on the awk maintained by Brian Kernighan (the k in awk)).

For GNU awk, until relatively recently (version 4.0), you had to call it with POSIXLY_CORRECT=anything awk '/^.{4}$/' for it to honour it. mawk still doesn't honour it.

Note that that operator is only syntactic sugar. .{3,5} can always be written ....?.? for instance (though of course {3,5} is a lot more legible, and the equivalent of (foo.{5,9}bar){123,456} would be a lot worse).


According to The GNU Awk User's Guide: Feature History, support for regular expression range operators was added in version 3.0 but initially required explicit command line option

New command-line options:

  • New command-line options:
    • The --lint-old option to warn about constructs that are not available in the original Version 7 Unix version of awk (see V7/SVR3.1).
    • The -m option from BWK awk. (Brian was still at Bell Laboratories at the time.) This was later removed from both his awk and from gawk.
    • The --re-interval option to provide interval expressions in regexps (see Regexp Operators).
    • The --traditional option was added as a better name for --compat (see Options).

In gawk 4.0,

Interval expressions became part of default regular expressions

Since you are using gawk 3.x, you will need to use

awk --re-interval '/^.{4}$/'

or

awk --posix '/^.{4}$/'

or (thanks @StéphaneChazelas) if you want a solution that is portable, use

POSIXLY_CORRECT=anything awk '/^.{4}$/'

(since --posix or --re-interval would cause an error in other awk implementations).


This works as expected with GNU awk (gawk):

$ printf 'abcd\nabc\nabcde\n' | gawk '/^.{4}$/'
abcd

But fails with mawk which is closer to POSIX awk and, AFAIK, is the default on Ubuntu systems:

$ printf 'abcd\nabc\nabcde\n' | mawk '/^.{4}$/'
$ ## prints nothing

So, a simple solution would be to use gawk instead of awk. The {n} notation isn't part of the POSIX BRE (basic regular expression) syntax. That's why grep also fails here:

$ printf 'abcd\nabc\nabcde\n' | grep '^.{4}$'
$

However, it is part of ERE (extended regular expressions):

$ printf 'abcd\nabc\nabcde\n' | grep -E '^.{4}$'
abcd

I don't know which regex flavor is used by mawk or POSIX awk, but I would guess it's BRE. They use an older version of ERE according to Stéphane's answer. In any case, either you are apparently using a version of awk that doesn't implement ERE or your input doesn't actually have any lines with exactly 4 characters. This could happen because of whitespace that you don't see or unicode glyphs, for example.