grep - regex that will find exactly 3 a's in a string

Here's one way with [^a] (match any character other than a) instead of . (match any character):

$ grep -E '^([^a]*a){3}[^a]*$' /usr/share/dict/cracklib-small | shuf -n 4
areaway
humanitarian
capitalizations
autonavigator

You can also write the regexp like ^[^a]*(a[^a]*){3}$ with the same results.

It's also equivalent to ^[^a]*a[^a]*a[^a]*a[^a]*$ which doesn't scale when you want a different number of a's. Performance is much better though, not that it matters unless you're grepping through gigabytes of data.

Instead of explicitly using the ^ and $ regexp anchor operators, you can also use the -x option which does that implicitly. See also the -i option to match case insensitively (according to locale):

grep -xiE '([^a]*a){3}[^a]*'

Use the same sort of pattern to detect "at least 4 as", and invert the sense of the match:

grep 'a.*a.*a' /usr/share/dict/words | grep -v 'a.*a.*a.*a'

or,

grep '\(a.*\)\{3\}' /usr/share/dict/words | grep -v '\(a.*\)\{4\}'

or,

grep -E '(a.*){3}' /usr/share/dict/words | grep -v -E '(a.*){4}'

Alternatively, use awk with a as the field delimiter and count the fields:

awk -F a 'NF == 4' /usr/share/dict/words

(on lines with three as, there would be four fields)

Alternatively, use Perl's tr operator to count the number of as on each line:

perl -ne 'print if (tr/a/a/ == 3)' /usr/share/dict/words

The operator returns the number of transliterations made, and we're replacing each a with another a, so the actual output would not be modified.

grep - regex that will find exactly 3 a's in a string

Tags:

Command Line

Grep

Regular Expression

Related

Recent Posts