Grep: The asterisk (*) doesn't always work

An asterisk in regular expressions means "match the preceding element 0 or more times".

In your particular case with grep 'This*String' file.txt, you are trying to say, "hey, grep, match me the word Thi, followed by lowercase s zero or more times, followed by the word String". The lowercase s is nowhere to be found in Example, hence grep ignores ThisExampleString.

In the case of grep '*String' file.txt, you are saying "grep, match me the empty string--literally nothing--preceding the word String". Of course, that's not how ThisExampleString is supposed to be read. (There are other possible meanings--you can try this with and without the -E flag--but none of the meanings are anything like what you really want here.)

Knowing that . means "any single character", we could do this: grep 'This.*String' file.txt. Now the grep command will read it correctly: This followed by any character (think of it as selection of ASCII characters) repeated any number of times, followed by String.


The * metacharacter in BRE1s, ERE1s and PCRE1s matches 0 or more occurences of the previously grouped pattern (if a grouped pattern is preceding the * metacharacter), 0 or more occurences of the previous character class (if a character class is preceding the * metacharacter) or 0 or more occurences of the previous character (if neither a grouped pattern nor a character class is preceding the * metacharacter);

This means that in the This*String pattern, being the * metacharacter not preceded either by a grouped pattern or a character class, the * metacharacter matches 0 or more occurences of the previous character (in this case the s character):

% cat infile               
ThisExampleString
ThisString
ThissString
% grep 'This*String' infile
ThisString
ThissString

To match 0 or more occurences of any character, you want to match 0 or more occurences of the . metacharacter, which matches any character:

% cat infile               
ThisExampleString
% grep 'This.*String' infile
ThisExampleString

The * metacharacter in BREs and EREs is always "greedy", i.e. it will match the longest match:

% cat infile
ThisExampleStringIsAString
% grep -o 'This.*String' infile
ThisExampleStringIsAString

This may not be the desired behavior; in case it's not, you can turn on grep's PCRE engine (using the -P option) and append the ? metacharacter, which when put after the * and + metacharacters has the effect of changing their greediness:

% cat infile
ThisExampleStringIsAString
% grep -Po 'This.*?String' infile
ThisExampleString

1: Basic Regular Expressions, Extended Regular Expressions and Perl Compatible Regular Expressions


One of explanation found here link:

Asterisk "*" does not mean the same thing in regular expressions as in wildcarding; it is a modifier that applies to the preceding single character, or expression such as [0-9]. An asterisk matches zero or more of what precedes it. Thus [A-Z]* matches any number of upper-case letters, including none, while [A-Z][A-Z]* matches one or more upper-case letters.