How to identify zero-width character?

I have a little bit of Javascript embedded within my explanation of Unicode which allows you to see the Unicode characters you copy/paste into a textbox. Your example looks like this:

Unicode explorer

Here you can see that the character is U+200B. Just searching for that will normally lead you to http://www.fileformat.info, in this case this page which can give you details of the character.

If you have the characters yourself within an application, Char.GetUnicodeCategory is your friend. (Oddly enough, there's no Char.GetUnicodeCategory(int) for non-BMP characters as far as I can see...)


According to similar question: Remove zero-width space characters from a JavaScript string

I'd hit ctrl+f (or ctrl+h) and turn on Regexp option, then search (or search-replace) for:

[\u200B-\u200D\uFEFF]

I've just tried your example and successfully replaced that zero-width space with "X" mark.

Just please note that this range covers only a few specific characters as explained in that post, not all invisible characters.

edit - thanks to this page I've found a better expression that seems nicely supported in the "find/replace" when Regexp option is turned on:

\p{Cf}

which seems to matches invisible characters, it successfully hit that one in your example, though I'm not exactly sure if it covers all you'd need. It may be worth playing with whole {C}-class or searching for whitespace|nonprintable plus negative match for {Z}-class (or {Zs}) negation.