Strip Byte Order Mark from string in C#

I recently had issues with the .net 4 upgrade but until then the simple answer is

String.Trim()

removes the BOM up until .net 3.5 However in .net 4 you need to change it slightly

String.Trim(new char[]{'\uFEFF'});

That will also get rid of the Byte order mark, though you may also want to remove the ZERO WIDTH SPACE U+200B

String.Trim(new char[]{'\uFEFF','\u200B'});

This you could also use to remove other unwanted characters

Some further information from http://msdn.microsoft.com/en-us/library/t97s7bs3.aspx

The .NET Framework 3.5 SP1 and earlier versions maintain an internal list of white-space characters that this method trims. Starting with the .NET Framework 4, the method trims all Unicode white-space characters (that is, characters that produce a true return value when they are passed to the Char.IsWhiteSpace method). Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), that the Trim method in the .NET Framework 4 and later versions does not remove. In addition, the Trim method in the .NET Framework 3.5 SP1 and earlier versions does not trim three Unicode white-space characters: MONGOLIAN VOWEL SEPARATOR (U+180E), NARROW NO-BREAK SPACE (U+202F), and MEDIUM MATHEMATICAL SPACE (U+205F).


I had some incorrect test data, which caused me some confusion. Based on How to avoid tripping over UTF-8 BOM when reading files I found that this worked:

private readonly string _byteOrderMarkUtf8 =
    Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());

public string GetXmlResponse(Uri resource)
{
    string xml;

    using (var client = new WebClient())
    {
        client.Encoding = Encoding.UTF8;
        xml = client.DownloadString(resource);
    }

    if (xml.StartsWith(_byteOrderMarkUtf8, StringComparison.Ordinal))
    {
        xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
    }

    return xml;
}

Setting the client Encoding property correctly reduces the BOM to a single character. However, XDocument.Parse still will not read that string. This is the cleanest version I've come up with to date.


This works as well

int index = xmlResponse.IndexOf('<');
if (index > 0)
{
    xmlResponse = xmlResponse.Substring(index, xmlResponse.Length - index);
}