Posts Tagged ‘beginning rewriting’

Redirecting and rewriting using htaccess

One of the more powerful tricks of the .htaccess hacker is the ability to rewrite URLs. This enables us to do some mighty manipulations on our links; useful stuff like transforming very long URL’s into short, cute URLs, transforming dynamic ?generated=page&URL’s into /friendly/flat/links, redirect missing pages, preventing hot-linking, performing automatic language translation, and much, much more.

Make no mistake, mod_rewrite is complex. This isn’t the subject for a quick bite-size tech-snack, probably not even a week-end crash-course, I’ve seen guys pull off some real cute stuff with mod_rewrite, but with kudos-hat tipped firmly towards that bastard operator from hell, Ralf S. Engelschall, author of the magic module itself, I have to admit that a great deal of it still seems so much voodoo to me.

The way that rules can work one minute and then seem not to the next, how browser and other in-between network caches interact with rules and testing rules is often baffling, maddening. When I feel the need to bend my mind completely out of shape, I mess around with mod_rewrite!

After all this, it does work, and while I’m not planning on taking that week-end crash-course any time soon, I have picked up a few wee tricks myself, messing around with webservers and web sites, this place..

The plan here is to just drop some neat stuff, examples, things that have proven useful, and work on a variety of server setups; there are apache’s all over my LAN, I keep coming across old .htaccess files stuffed with past rewriting experiments that either worked; and I add them to my list, or failed dismally; and I’m surprised that more often these days, I can see exactly why!

Very little here is my own invention. Even the bits I figured out myself were already well documented, I just hadn’t understood the documents, or couldn’t find them. Sometimes, just looking at the same thing from a different angle can make all the difference, so perhaps this humble stab at URL Rewriting might be of some use. I’m writing it for me, of course. but I do get some credit for this..

# time to get dynamic, see..
RewriteRule (.*)\.htm $1.php

Beginning rewriting..

Whenever you use mod_rewrite (the part of apache that does all this magic), you need to do..

you only need to do this once per .htaccess file:

Options +FollowSymlinks
RewriteEngine on

..before any ReWrite rules. note: +FollowSymLinks must be enabled for any rules to work, this is a security requirement of the rewrite engine. Normally it’s enabled in the root and you shouldn’t have to add it, but it doesn’t hurt to do so, and I’ll insert it into all the examples on this page, just in case*.

The next line simply switches on the rewrite engine for that folder. if this directive is in you main .htaccess file, then the ReWrite engine is theoretically enabled for your entire site, but it’s wise to always add that line before you write any redirections, anywhere.

* Although highly unlikely, your host may have +FollowSymLinks enabled at the root level, yet disallow its addition in .htaccess; in which case, adding +FollowSymLinks will break your setup (probably a 500 error), so just remove it, and your rules should work fine.

Important: While some of the directives on this page may appear split onto two lines, in your .htaccess file, they must exist completely on one line. If you drag-select and copy the directives on this page, they should paste just fine into any text editor.

simple rewriting

Simply put, Apache scans all incoming URL requests, checks for matches in our .htaccess file and rewrites those matching URLs to whatever we specify. something like this..

all requests to whatever.htm will be sent to whatever.php:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.*)\.htm$ $1.php [NC]

Handy for anyone updating a site from static htm (you could use .html, or .htm(.*), .htm?, etc) to dynamic php pages; requests to the old pages are automatically rewritten to our new urls. no one notices a thing, visitors and search engines can access your content either way. leave the rule in; as an added bonus, this enables us to easily split php code and its included html structures into two separate files, a nice idea; makes editing and updating a breeze. The [NC] part at the end means “No Case”, or “case-insensitive”; more on the switches, later.

Folks can link to whatever.htm or whatever.php, but they always get whatever.php in their browser, and this works even if whatever.htm doesn’t exist! But I’m straying..

As it stands, it’s a bit tricky; folks will still have whatever.htm in their browser address bar, and will still keep bookmarking your old .htm URL’s. Search engines, too, will keep on indexing your links as .htm, some have even argued that serving up the same content from two different places could have you penalized by the search engines. This may or not bother you, but if it does, mod_rewrite can do some more magic..

this will do a “real” external redirection:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.+)\.htm$ http://mysite/$1.php [R,NC]

This time we instruct mod_rewrite to do a proper external rewrite, aka, “redirection”. Now, instead of just background rewriting on-the-fly, the user’s browser is physically redirected to a new URI, and whatever.php appears in their browser’s address bar – search engines and other spidering entities will automatically update their links to the .php versions; everyone wins. You can take your time with the updating, too.

Note: if you use [R] alone, it defaults to sending an HTTP “MOVED TEMPORARILY” redirection, aka, “302”. But you can send other codes, like so..

this performs the exact same as the previous example RewriteRule.

RewriteRule ^(.+)\.htm$ http://mysite/$1.php [R=302,NC]

Okay, I sent the exact same code, but I didn’t have to. For details of the many 30* response codes you can send, see here. Most people seem to want to send 301, aka, “MOVED PERMENENTLY”.

Note: if you add an “L” flag to the mix; meaning “Last Rule”, e.g. [R=302,NC,L]; Apache will stop processing rules for this request at that point, which may or may not be what you want. Either way, it’s useful to know.

not-so-simple rewriting … flat links and more

You may have noticed, the above examples use regular expression to match variables. What that simply means is.. match the part inside (.+) and use it to construct “$1” in the new URL. In other words, (.+) = $1 you could have multiple (.+) parts and for each, mod_rewrite automatically creates a matching $1, $2, $3, etc, in your target (aka. ‘substitution’) URL. This facility enables us to do all sorts of tricks, and the most common of those, is the creation of “flat links”..

Even a cute short link like http://mysite/grab?file=my.zip is too ugly for some people, and nothing less than a true old-school solid domain/path/flat/link will do. Fortunately, mod_rewrite makes it easy to convert URLs with query strings and multiple variables into exactly this, something like..

a more complex rewrite rule:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^files/([^/]+)/([^/]+).zip /download.php?section=$1&file=$2 [NC]

would allow you to present this link as..

http://mysite/files/games/hoopy.zip

and in the background have that transparently translated, server-side, to..

http://mysite/download.php?section=games&file=hoopy

which some script could process. You see, many search engines simply don’t follow our ?generated=links, so if you create generating pages, this is useful. However, it’s only the dumb search engines that can’t handle these kinds of links; we have to ask ourselves.. do we really want to be listed by the dumb search engines? Google will handle a good few parameters in your URL without any problems, and the (hungry hungry) msn-bot stops at nothing to get that page, sometimes again and again and again…

I personally feel it’s the search engines that should strive to keep up with modern web technologies, in other words; we shouldn’t have to dumb-down for them. But that’s just my opinion. Many users will prefer /files/games/hoopy.zip to /download.php?section=games&file=hoopy but I don’t mind either way. As someone pointed out to me recently, presenting links as standard/flat/paths means you’re less likely to get folks doing typos in typed URL’s, so something like..

an even more complex rewrite rule:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^blog/([0-9]+)-([a-z]+) http://mysite/blog/index.php?archive=$1-$2 [NC]

Here’s the very basics of regexp (expanded from the apache mod_rewrite documentation)..

Escaping:

\char escape that particular char

For instance to specify special characters.. [].()\ etc.

Text:

.             Any single character  (on its own = the entire URI)
[chars]       Character class: One of following chars
[^chars]      Character class: None of following chars
text1|text2   Alternative: text1 or text2 (i.e. “or”)

e.g. [^/] matches any character except /
(foo|bar)\.html matches foo.html and bar.html

Quantifiers:

? 0 or 1 of the preceding text
* 0 or N of the preceding text  (hungry)
+ 1 or N of the preceding text

e.g. (.+)\.html? matches foo.htm and foo.html
(foo)?bar\.html matches bar.html and foobar.html

Grouping:

(text)  Grouping of text

Either to set the borders of an alternative or
for making backreferences where the nth group can
be used on the target of a RewriteRule with $n

e.g.  ^(.*)\.html foo.php?bar=$1

Anchors:

^    Start of line anchor
$    End   of line anchor

An anchor explicitly states that the character right next to it MUST
be either the very first character (“^”), or the very last character (“$”)
of the URI string to match against the pattern, e.g..

^foo(.*) matches foo and foobar but not eggfoo
(.*)l$ matches fool and cool, but not foo

..
.
.
.
Hope you enjoy this, I guess it help you to build your rewriting concepts 🙂