What Unicode characters are dangerous?

A Golden Rule in security is to whitelist instead of blacklist, instead of trying to cover all bad characters, it is a much better idea to validate based on ensuring the user only use known good characters.

There are solutions that help you build the large whitelist that is required for international whitelisting. For example, in .NET there is UnicodeCategory.

The idea is that instead of whitelisting thousands of individual characters, the library assigns them into categories like alphanumeric characters, punctuations, control characters, and such.

Tutorial on whitelisting international characters in .NET

Unicode Regex: Categories


Characters aren’t dangerous: only inappropriate uses of them are.

You might consider reading things like:

  • Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax
  • RFC 3454: Preparation of Internationalized Strings (“stringprep”)

It is impossible to guess what you mean by dangerous.