What difference does it make matching a word with/without a trailing whitespace?

It's a cheap and error-prone way of doing word matching.

Note that the with a space after it does not match the word thereby, so matching with a space after the avoids matching that string at the start of words. However, it still does match bathe (if followed by a space), and it does not match the at the end of a line.

To match the word the properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.

Instead, use a zero-width word boundary pattern:

sed 's/\<the\>/this/'

The \< and \> matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_] (or [A-Za-z0-9_] in the POSIX locale).

With GNU sed, you could also use \b in place of \< and \>:

sed 's/\bthe\b/this/'

The difference is whether there is a space after the in the input text.
For instance:

With a sentence without a space, no replacement:

$ echo 'theman' | sed 's/the /this /'
theman

With a sentence with a space, works as expected:

$ echo 'the man' | sed 's/the /this /'
this man

With a sentence with another whitespace character, no replacement will occur:

$ echo -e 'the\tman' | sed 's/the /this /'
the     man

sed works with regular expressions. Using sed 's/the /this /' you just make the space after the part of the matched pattern.

Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.

In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).

You can see the difference if you try for example to capitalize the in the word the theater:

echo 'the theater' |sed 's/the /THE /g'
THE theater                              
#theater is ignored since the is not followed by space

echo 'the theater' |sed 's/the/THE/g'
THE THEater
#both the are capitalized.

Tags:

Sed

Whitespace