Is a slash ("/") equivalent to an encoded slash ("%2F") in the path portion of an HTTP URL

From the data you gathered, I would tend to say that encoded "/" in an uri are meant to be seen as "/" again at application/cgi level.

That's to say, that if you're using apache with mod_rewrite for instance, it will not match pattern expecting slashes against URI with encoded slashes in it. However, once the appropriate module/cgi/... is called to handle the request, it's up to it to do the decoding and, for instance, retrieve a parameter including slashes as the first component of the URI.

If your application is then using this data to retrieve a file (whose filename contains a slash), that's probably a bad thing.

To sum up, I find it perfectly normal to see a difference of behaviour in "/" or "%2F" as their interpretation will be done at different levels.


The story of %2F vs / was that, according to the initial W3C recommendations, slashes «must imply a hierarchical structure»:

The slash ("/", ASCII 2F hex) character is reserved for the delimiting of substrings whose relationship is hierarchical. This enables partial forms of the URI.

Example 2

The URIs

http://www.w3.org/albert/bertram/marie-claude

and

http://www.w3.org/albert/bertram%2Fmarie-claude

are NOT identical, as in the second case the encoded slash does not have hierarchical significance.


I also have a site that has numerous urls with urlencoded characters. I am finding that many web APIs (including Google webmaster tools and several Drupal modules) trip over urlencoded characters. Many APIs automatically decode urls at some point in their process and then use the result as a URL or HTML. When I find one of these problems, I usually double encode the results (which turns %2f into %252f) for that API. However, this will break other APIs which are not expecting double encoding, so this is not a universal solution.

Personally I am getting rid of as many special characters in my URLs as possible.

Also, I am using id numbers in my URLs which do not depend on urldecoding:

example.com/blog/my-amazing-blog%2fstory/yesterday

becomes:

example.com/blog/12354/my-amazing-blog%2fstory/yesterday

in this case, my code only uses 12354 to look for the article, and the rest of the URL gets ignored by my system (but is still used for SEO.) Also, this number should appear BEFORE the unused URL components. that way, the url will still work, even if the %2f gets decoded incorrectly.

Also, be sure to use canonical tags to ensure that url mistakes don't translate into duplicate content.

Tags:

Http

Url

Encoding