How to encode special characters using mod_rewrite & Apache?

The underlying problem is that you are moving from a request that has one encoding (specifically, a plus sign is a plus sign) into a request that has different encoding (a plus sign represents a space). The solution is to bypass the decoding that mod_rewrite does and convert your path directly from the raw request to the query string.

To bypass the normal flow of the rewrite rules, we’ll load the raw request string directly into an environment variable and modify the environment variable instead of the normal rewrite path. It will already be encoded, so we don't generally need to worry about encoding it when we move it to the query string. What we do want, however, is to percent-encode the plus signs so that they are properly relayed as plus signs and not spaces.

The rules are incredibly simple:

RewriteEngine On

RewriteRule ^script.php$ - [L]

# Move the path from the raw request into _rq
RewriteCond %{ENV:_rq} =""
RewriteCond %{THE_REQUEST} "^[^ ]+ (/path/[^/]+/[^? ]+)"
RewriteRule .* - [E=_rq:%1]

# encode the plus signs (%2B)  (Loop with [N])
RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)\+(.*)$"
RewriteRule .* - [E=_rq:/path/%1/%2\%2B%3,N]

# finally, move it from the path to the query string
# ([NE] says to not re-code it)
RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)$"
RewriteRule .* /path/script.php?%1=%2 [NE]

This trivial script.php confirms that it works:

<input readonly type="text" value="<?php echo $_GET['tag']; ?>" />

The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.

I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.

So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).

So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.

Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags - curiously it uses more or less the same example!

RewriteRule ^tag/(.*)$ /script.php?tag=$1 [B]

I'm not sure I understand what you're asking, but the NE (noescape) flag to Apache's RewriteRule directive might be of some interest to you. Basically, it prevents mod_rewrite from automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is

RewriteRule /foo/(.*) /bar/arg=P1\%3d$1 [R,NE]

which will turn, for example, /foo/zed into a redirect to /bar/arg=P1%3dzed, so that the script /bar will then see a query parameter named arg with a value P1=zed, if it looks in its PATH_INFO (okay, that's not a real query parameter, so sue me ;-P).

At least, I think that's how it works . . . I've never used that particular flag myself.


I finally made it work with the help of RewriteMap.

Added the escape map in httpd.conf file RewriteMap es int:escape

and used it in Rewrite rule

RewriteRule ([^?.]*) /abc?arg1=${es:$1}&country_sniff=true [L]