Removing hidden characters from within strings

I usually use this regular expression to replace all non-printable characters.

By the way, most of the people think that tab, line feed and carriage return are non-printable characters, but for me they are not.

So here is the expression:

string output = Regex.Replace(input, @"[^\u0009\u000A\u000D\u0020-\u007E]", "*");
  • ^ means if it's any of the following:
  • \u0009 is tab
  • \u000A is linefeed
  • \u000D is carriage return
  • \u0020-\u007E means everything from space to ~ -- that is, everything in ASCII.

See ASCII table if you want to make changes. Remember it would strip off every non-ASCII character.

To test above you can create a string by yourself like this:

    string input = string.Empty;

    for (int i = 0; i < 255; i++)
    {
        input += (char)(i);
    }

You can remove all control characters from your input string with something like this:

string input; // this is your input string
string output = new string(input.Where(c => !char.IsControl(c)).ToArray());

Here is the documentation for the IsControl() method.

Or if you want to keep letters and digits only, you can also use the IsLetter and IsDigit function:

string output = new string(input.Where(c => char.IsLetter(c) || char.IsDigit(c)).ToArray());