Reference: mod_rewrite, URL rewriting and "pretty links" explained

To expand on deceze's answer, I wanted to provide a few examples and explanation of some other mod_rewrite functionality.

All of the below examples assume that you have already included RewriteEngine On in your .htaccess file.

Rewrite Example

Lets take this example:

RewriteRule ^blog/([0-9]+)/([A-Za-z0-9-\+]+)/?$ /blog/index.php?id=$1&title=$2 [NC,L,QSA]

The rule is split into 4 sections:

  1. RewriteRule - starts the rewrite rule
  2. ^blog/([0-9]+)/([A-Za-z0-9-\+]+)/?$ - This is called the pattern, however I'll just refer to it as the left hand side of the rule - what you want to rewrite from
  3. blog/index.php?id=$1&title=$2 - called the substitution, or right hand side of a rewrite rule - what you want to rewrite to
  4. [NC,L,QSA] are flags for the rewrite rule, separated by a comma, which I will explain more on later

The above rewrite would allow you to link to something like /blog/1/foo/ and it would actually load /blog/index.php?id=1&title=foo.

Left hand side of the rule

  • ^ indicates the start of the page name - so it will rewrite example.com/blog/... but not example.com/foo/blog/...
  • Each set of (…) parentheses represents a regular expression that we can capture as a variable in the right hand side of the rule. In this example:
    • The first set of brackets - ([0-9]+) - matches a string with a minimum of 1 character in length and with only numeric values (i.e. 0-9). This can be referenced with $1 in the right hand side of the rule
    • The second set of parentheses matches a string with a minimum of 1 character in length, containing only alphanumeric characters (A-Z, a-z, or 0-9) or - or + (note + is escaped with a backslash as without escaping it this will execute as a regex repetition character). This can be referenced with $2 in the right hand side of the rule
  • ? means that the preceding character is optional, so in this case both /blog/1/foo/ and /blog/1/foo would rewrite to the same place
  • $ indicates this is the end of the string we want to match

Flags

These are options that are added in square brackets at the end of your rewrite rule to specify certain conditions. Again, there are a lot of different flags which you can read up on in the documentation, but I'll go through some of the more common flags:

NC

The no case flag means that the rewrite rule is case insensitive, so for the example rule above this would mean that both /blog/1/foo/ and /BLOG/1/foo/ (or any variation of this) would be matched.

L

The last flag indicates that this is the last rule that should be processed. This means that if and only if this rule matches, no further rules will be evaluated in the current rewrite processing run. If the rule does not match, all other rules will be tried in order as usual. If you do not set the L flag, all following rules will be applied to the rewritten URL afterwards.

END

Since Apache 2.4 you can also use the [END] flag. A matching rule with it will completely terminate further alias/rewrite processing. (Whereas the [L] flag can oftentimes trigger a second round, for example when rewriting into or out of subdirectories.)

QSA

The query string append flag allows us to pass in extra variables to the specified URL which will get added to the original get parameters. For our example this means that something like /blog/1/foo/?comments=15 would load /blog/index.php?id=1&title=foo&comments=15

R

This flag isn't one I used in the example above, but is one I thought is worth mentioning. This allows you to specify a http redirect, with the option to include a status code (e.g. R=301). For example if you wanted to do a 301 redirect on /myblog/ to /blog/ you would simply write a rule something like this:

RewriteRule ^/myblog/(*.)$ /blog/$1 [R=301,QSA,L]

Rewrite Conditions

Rewrite conditions make rewrites even more powerful, allowing you to specify rewrites for more specific situations. There are a lot of conditions which you can read about in the documentation, but I'll touch on a few common examples and explain them:

# if the host doesn't start with www. then add it and redirect
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

This is a very common practice, which will prepend your domain with www. (if it isn't there already) and execute a 301 redirect. For example, loading up http://example.com/blog/ it would redirect you to http://www.example.com/blog/

# if it cant find the image, try find the image on another domain
RewriteCond %{REQUEST_URI} \.(jpg|jpeg|gif|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*)$ http://www.example.com/$1 [L]

This is slightly less common, but is a good example of a rule that doesn't execute if the filename is a directory or file that exists on the server.

  • %{REQUEST_URI} \.(jpg|jpeg|gif|png)$ [NC] will only execute the rewrite for files with a file extension of jpg, jpeg, gif or png (case insensitive).
  • %{REQUEST_FILENAME} !-f will check to see if the file exists on the current server, and only execute the rewrite if it doesn't
  • %{REQUEST_FILENAME} !-d will check to see if the file exists on the current server, and only execute the rewrite if it doesn't
  • The rewrite will attempt to load the same file on another domain

To understand what mod_rewrite does you first need to understand how a web server works. A web server responds to HTTP requests. An HTTP request at its most basic level looks like this:

GET /foo/bar.html HTTP/1.1

This is the simple request of a browser to a web server requesting the URL /foo/bar.html from it. It is important to stress that it does not request a file, it requests just some arbitrary URL. The request may also look like this:

GET /foo/bar?baz=42 HTTP/1.1

This is just as valid a request for a URL, and it has more obviously nothing to do with files.

The web server is an application listening on a port, accepting HTTP requests coming in on that port and returning a response. A web server is entirely free to respond to any request in any way it sees fit/in any way you have configured it to respond. This response is not a file, it's an HTTP response which may or may not have anything to do with physical files on any disk. A web server doesn't have to be Apache, there are many other web servers which are all just programs which run persistently and are attached to a port which respond to HTTP requests. You can write one yourself. This paragraph was intended to divorce you from any notion that URLs directly equal files, which is really important to understand. :)

The default configuration of most web servers is to look for a file that matches the URL on the hard disk. If the document root of the server is set to, say, /var/www, it may look whether the file /var/www/foo/bar.html exists and serve it if so. If the file ends in ".php" it will invoke the PHP interpreter and then return the result. All this association is completely configurable; a file doesn't have to end in ".php" for the web server to run it through the PHP interpreter, and the URL doesn't have to match any particular file on disk for something to happen.

mod_rewrite is a way to rewrite the internal request handling. When the web server receives a request for the URL /foo/bar, you can rewrite that URL into something else before the web server will look for a file on disk to match it. Simple example:

RewriteEngine On
RewriteRule   /foo/bar /foo/baz

This rule says whenever a request matches "/foo/bar", rewrite it to "/foo/baz". The request will then be handled as if /foo/baz had been requested instead. This can be used for various effects, for example:

RewriteRule (.*) $1.html

This rule matches anything (.*) and captures it ((..)), then rewrites it to append ".html". In other words, if /foo/bar was the requested URL, it will be handled as if /foo/bar.html had been requested. See http://regular-expressions.info for more information about regular expression matching, capturing and replacements.

Another often encountered rule is this:

RewriteRule (.*) index.php?url=$1

This, again, matches anything and rewrites it to the file index.php with the originally requested URL appended in the url query parameter. I.e., for any and all requests coming in, the file index.php is executed and this file will have access to the original request in $_GET['url'], so it can do anything it wants with it.

Primarily you put these rewrite rules into your web server configuration file. Apache also allows* you to put them into a file called .htaccess within your document root (i.e. next to your .php files).

* If allowed by the primary Apache configuration file; it's optional, but often enabled.

What mod_rewrite does not do

mod_rewrite does not magically make all your URLs "pretty". This is a common misunderstanding. If you have this link in your web site:

<a href="/my/ugly/link.php?is=not&amp;very=pretty">

there's nothing mod_rewrite can do to make that pretty. In order to make this a pretty link, you have to:

  1. Change the link to a pretty link:

    <a href="/my/pretty/link">
    
  2. Use mod_rewrite on the server to handle the request to the URL /my/pretty/link using any one of the methods described above.

(One could use mod_substitute in conjunction to transform outgoing HTML pages and their contained links. Though this is usally more effort than just updating your HTML resources.)

There's a lot mod_rewrite can do and very complex matching rules you can create, including chaining several rewrites, proxying requests to a completely different service or machine, returning specific HTTP status codes as responses, redirecting requests etc. It's very powerful and can be used to great good if you understand the fundamental HTTP request-response mechanism. It does not automatically make your links pretty.

See the official documentation for all the possible flags and options.