Server.UrlEncode vs. HttpUtility.UrlEncode

Fast-forward almost 9 years since this was first asked, and in the world of .NET Core and .NET Standard, it seems the most common options we have for URL-encoding are WebUtility.UrlEncode (under System.Net) and Uri.EscapeDataString. Judging by the most popular answer here and elsewhere, Uri.EscapeDataString appears to be preferable. But is it? I did some analysis to understand the differences and here's what I came up with:

WebUtility.UrlEncode encodes space as +; Uri.EscapeDataString encodes it as %20.
Uri.EscapeDataString percent-encodes !, (, ), and *; WebUtility.UrlEncode does not.
WebUtility.UrlEncode percent-encodes ~; Uri.EscapeDataString does not.
Uri.EscapeDataString throws a UriFormatException on strings longer than 65,520 characters; WebUtility.UrlEncode does not. (A more common problem than you might think, particularly when dealing with URL-encoded form data.)
Uri.EscapeDataString throws a UriFormatException on the high surrogate characters; WebUtility.UrlEncode does not. (That's a UTF-16 thing, probably a lot less common.)

For URL-encoding purposes, characters fit into one of 3 categories: unreserved (legal in a URL); reserved (legal in but has special meaning, so you might want to encode it); and everything else (must always be encoded).

According to the RFC, the reserved characters are: :/?#[]@!$&'()*+,;=

And the unreserved characters are alphanumeric and -._~

The Verdict

Uri.EscapeDataString clearly defines its mission: %-encode all reserved and illegal characters. WebUtility.UrlEncode is more ambiguous in both definition and implementation. Oddly, it encodes some reserved characters but not others (why parentheses and not brackets??), and stranger still it encodes that innocently unreserved ~ character.

Therefore, I concur with the popular advice - use Uri.EscapeDataString when possible, and understand that reserved characters like / and ? will get encoded. If you need to deal with potentially large strings, particularly with URL-encoded form content, you'll need to either fall back on WebUtility.UrlEncode and accept its quirks, or otherwise work around the problem.

EDIT: I've attempted to rectify ALL of the quirks mentioned above in Flurl via the Url.Encode, Url.EncodeIllegalCharacters, and Url.Decode static methods. These are in the core package (which is tiny and doesn't include all the HTTP stuff), or feel free to rip them from the source. I welcome any comments/feedback you have on these.

Here's the code I used to discover which characters are encoded differently:

var diffs =
    from i in Enumerable.Range(0, char.MaxValue + 1)
    let c = (char)i
    where !char.IsHighSurrogate(c)
    let diff = new {
        Original = c,
        UrlEncode = WebUtility.UrlEncode(c.ToString()),
        EscapeDataString = Uri.EscapeDataString(c.ToString()),
    }
    where diff.UrlEncode != diff.EscapeDataString
    select diff;

foreach (var diff in diffs)
    Console.WriteLine($"{diff.Original}\t{diff.UrlEncode}\t{diff.EscapeDataString}");

HttpServerUtility.UrlEncode will use HttpUtility.UrlEncode internally. There is no specific difference. The reason for existence of Server.UrlEncode is compatibility with classic ASP.

I had significant headaches with these methods before, I recommend you avoid any variant of UrlEncode, and instead use Uri.EscapeDataString - at least that one has a comprehensible behavior.

Let's see...

HttpUtility.UrlEncode(" ") == "+" //breaks ASP.NET when used in paths, non-
                                  //standard, undocumented.
Uri.EscapeUriString("a?b=e") == "a?b=e" // makes sense, but rarely what you
                                        // want, since you still need to
                                        // escape special characters yourself

But my personal favorite has got to be HttpUtility.UrlPathEncode - this thing is really incomprehensible. It encodes:

" " ==> "%20"
"100% true" ==> "100%%20true" (ok, your url is broken now)
"test A.aspx#anchor B" ==> "test%20A.aspx#anchor%20B"
"test A.aspx?hmm#anchor B" ==> "test%20A.aspx?hmm#anchor B" (note the difference with the previous escape sequence!)

It also has the lovelily specific MSDN documentation "Encodes the path portion of a URL string for reliable HTTP transmission from the Web server to a client." - without actually explaining what it does. You are less likely to shoot yourself in the foot with an Uzi...

In short, stick to Uri.EscapeDataString.

Server.UrlEncode vs. HttpUtility.UrlEncode

The Verdict

Tags:

.Net

Asp.Net

Urlencode

Related

Recent Posts