Why is egrep ignoring the negative whitespace?

The complement of \s is \S, not [^\s] which (with the help of -i) excluded 'SIX' and 'Sam' from the result because they contain a literal s.


How to grep -i for lines starting with "host", followed by one or more whitespaces and a sequence of one or more characters until the end of the line, where no literal * or whitespace can exist:

grep -Ei '^host[[:space:]]+[^*[:space:]]+$' file
Host opengrok-01-Eight
Host opengrok-02-SIX
Host opengrok-03-forMe
Host opengrok-04-ForSam
Host opengrok-05-Okay

Interpreting \s as whitespace is an extension of GNU Grep. It is not defined in POSIX. BSD Grep, for example, does not identify \s as whitespace. Perl regexes are also an extension to POSIX, but both BSD and GNU provide it. For a totally portable expression, you should use [[:space:]] instead.

The GNU Grep manual states somewhat loosely that "most meta-characters lose their special meaning inside bracket expressions." You have found that \s is one of them, and it is in fact specified by POSIX (again) that the special characters ., *, [ and \ should lose their special meaning within a bracket expression. But you can still portably use [:space:].

So, answering your two questions,

How should I be writing my grep -E for this case?

grep -Ei '^host[[:space:]]+[^*[:space:]]+[[:space:]]*$'

Are there any other nasty gotchas that I have been missing with -E or that will bite me if I use -P?

A common mistake is to try the Perl non-greedy .*? with no -P flag.

$ echo 'AB 14 34' | grep -Eo '^.*?4'
AB 14 34
$ echo 'AB 14 34' | grep -Po '^.*?4'
AB 14
$ echo 'AB 14 34' | grep -o  '^.*?4'
{nothing}

The final word: BRE and ERE and PRE are all different. Know your regexes!