Redirection with Mod_Rewrite

Information and tutorials covering what you can and can't do with mod_rewrite, regular expressions, creating rules, htaccess placement.

Redirection with Mod_Rewrite

Postby Mad Mod_Rewriter » Thu Jan 19, 2006 1:29 pm

URL Redirection with Mod-Rewrite
Creating the Rules

The Building Blocks of Mod-Rewrite URL Redirection Rules: Special Characters


Along with regular expressions, mod-rewrite allows for the use of special characters. It's a good thing to understand what these are before you begin writing rules. (Mainly because you need one or more of them in almost every rule.)


RewriteRule tells the server to interpret the following information as a rule.


RewriteCond tells the server to interpret the following information as a condtion of the rule(s) that are immediately after it.


^ defines the begining of a 'line' (starting anchor). Remember, ^ also designates 'not' in a regular expression, so please don't get confused.


( ) creates a variable to be stored and possibly used later, and is also used to group text.


$ defines the ending of a 'line' (ending anchor), and also defines a variable that comes from the RewriteRule (used for variables on the right side of the equasion or to match a variable from the rule in a condition, see example below).


% defines a variable that comes from a rewrite condition. (used for variables on the right side of the equasion only, see example below)


* The right side of the equasion is everything that follows the $ in a RewriteRule.


Examples: All variables are given a number according to the order they appear, the following rule and condition each have two variables, defined by parenthesis, so to use them you would put them where you need them in the results:
(the '-' is for spacing only to make the line more readable, and is not necessary to use variables.)


RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-$2
The final result would look like this:
to-use-variables-type-var1-and-var2


RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^no-var/no-var/no-var$ /to-use-variables-type-%1-and-%2
The final result would look like this:
to-use-variables-type-var1-and-var2


To use a combination of the Condition and Rule Variables
RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-%2-$2
The final result would look like this: to-use-variables-type-var1-and-var2-var2


The only exception to the above examples is, you can also use the %{CONDITION_STUFF} in the right side of a rule, but it must appear exactly as in the condition: RewriteRule ^(var1)/no-var/(var2)$ /type-%{CONDITION_STUFF}

|(bar) stands for 'or', normally used with text or expressions grouped with parenthesis (EG (with|without) matches the string 'with' or the string 'without'. Keep in mind since these are inside parenthesis, the match is stored as a variable.)


\ is called an escaping character, this removes the function from a 'special character' (EG if you needed to match index.php?, which has both a .(dot) and a ?, you would have to 'escape' the special characters .(dot) and ? with a \ to remove their 'special' value it looks like this: index\.php\?)

! is like the ^ in a regular expression and stands for 'not', but can only be used at the beginning of a rule or condition, not in the middle.

- on the right side of the equasion stands for 'No Rewrite.' (It is often used in conjunction with a condition to check and see if a file or directory exists.)


Mod-Rewrite Directives for URL Redirection


Directives, in mod-rewrite are what give you the control of the response sent by the server when a specific URL is requested. They are an integral part of the rule writing process, because they designate any special instructions that might be needed. (EG If I want to tell everyone a page is moved permanently, I can add R=301 to my rule and they will know.)


Directives follow the rule and the most often used, are enclosed with [ ] (Not all directives are covered here, but the main and widely used ones are.)


[R] stands for redirect. The default is 302, temporarily moved. This can be set to any number between 300 and 400, by entering it as [R=301] or [R=YourNumberHere], but 301 (permanently moved) and 302 (temporarily moved) are the most common.


(If you just use [R] this will work, and defaults to 302, or temporarily moved)

** Do not use this 'flag' or directive if you are trying to have a 'silent' redirect.


[F] stands for forbiden. Any URL or file that matches the rule (and condition(s) if present) will return FORBIDEN to anyone who tries to access them. (Useful for files that you would like to keep private, or you do not want indexed prior to 'going live' with them.)

[G] stands for gone. (It's like Not Found, only different.) Not recommended for use yet, this is a newer rule/message (410 code) and many browsers and user-agents, like googlebot do not understand them yet.

[P] stands for proxy. This creates a type of 'silent redirect' for files or pages that are not actually part of your site and can be used to serve pages from a different host, as though they were part of your site. (DO NOT mess with copywritten material, some of us get very upset.)


[NC] stands for 'No Case' as applied to letters, so if you use this on a rule, MYsite.com, will match mysite.com... even though they are not the same. (This can also be used with regular expressions, so instead of [a-zA-Z], you can use [a-z] and [NC] at the end of the rule for the same effect.)

[QSA] stands for Query String Append. This means the 'query string' (stuff after the ?) should be passed from the original URL (the one we are rewriting) to the new URL.

[L] stands for last rule. As soon as this 'flag' or directive is read, no other rules are processed. (Every rule should contain this flag, until you know exactly what you are doing.)


In an attempt to put together regular expressions and mod-rewrite special characters here are some examples of what they do:

Goal: to match any lowercase words, or group of letters:
Possible Matches: lfie, page, site, or information
Expression: [a-z]+
Explaination: [a-z] matches any single letter. + matches 1 or more of the previous character or string of characters. When you put the two together you have a regular expression that matches any single letter from a to z over and over, until it runs into a character that is not a letter.


Goal: to match any words, or groups of letters, and store them in a variable:
Possible Matches: lfie, Page, site, or InforMation
Expression: ([a-z]+) [NC]
Explaination: Same as above with the addition of () and [NC]. In mod-rewrite, () creates a single variable out of the regular expression, so the word matched is now in a variable. [NC] stands for 'No Case' (from mod-rewrite) makes it so the regular expression or regular text strings, match both upper and lowercase letters, so with this expression you can match any single word.

Goal: to match any word, or group of letters, then any single number, and store them in separate variables:
Possible Matches: lfie1, Page2, site6, or InforMation9
Expression: ([a-z]+)([0-9]) [NC]
Explaination: Same as above, except notice there is no + in the number expression. This way only a single number will match.

Goal: to match any word, or group of letters, then any single number, and store them in the same variable:
Possible Matches: lfie1, Page2, site6, or InforMation9
Expression: ([a-z]+[0-9]) [NC]
Explaination: Same as above, except notice the plus is immediately following (no space) the [a-z], but before the [0-9] (again no space), so the + affects the [a-z], but not the [0-9].

Goal: to match any word, or group of letters, then any group of numbers, and store them in the same variable:
Possible Matches: lfie11, Page2, site642, or InforMation9987653
Expression: ([a-z]+[0-9]+) [NC]
Explaination: Same as above with the addition of a + immediately following to the numerical expression to match 1 or more numbers instead of only 1.

Goal: to match any word, or group of letters, any group of numbers, and any random letters and numbers, which might or might not be mixed together:
Possible Matches: 11, gPaE, s17ite642, or 2CreateInfo4UisCool
Expression: ([a-z0-9]+) [NC]
Explaination: the change here is to the regular expression grouping. Putting a-z and 0-9 in the same grouping followed by [NC] matches any combination of letters and numbers.


Goal: to match any word, or group of letters, then a single /, then any group of numbers, and store only the numbers in a variable.
Possible Matches: lfie/10, gPaE/1, site/642, or CreateInfoUisCool/2474890
Expression: [a-z]+/([0-9]+) [NC]
Explanation: Using the [a-z]+ without () matches the letters as usual. By putting the / outside of any expression, the only thing that will match is the exact character of /. Then using the ([0-9]+) again, stores any group of numbers in a variable.

Goal: to match anything before the / and store it in a variable, then match anything after the / and store it in a separate variable:
Possible Matches: lfie/10.html, gP..aE/1page_two.file, si&#te/642-your-site, or
CreateInfo/245390.php
Expression: ([^/]+)/(.+)
Explaination: Using two new forms of regular expressions, this is actually easier than it may seem. Making use of the ^(not) character, matches anything that is not a / and the () again save it in a variable. Then using the same form as above, the single, exact character of / is matched. Finally, the .(dot) character is used, because it matches any single character that is not the end of a line, and when combined with the + character, matches anything up to a line break. Once again () are used to create the variable. *Also, notice the use of a 'catch-alls' eliminates the need for the [NC] 'flag' of mod-rewrite.

If this was a full regular expression site, I would continue, but you should have an idea of how regular expressions work, so, time to move on...


Things to Remember About Mod-Rewrite URL Redirection


1. If you are using a condition(s) they always relate to the rule(s) that immediately follow them.


2. Mod-Rewrite will always try to match a URL to a rule before it checks the conditions, so if no rules match, the conditions are never checked.


3. After a URL request matches a rule, and changes are applied, the request is sent back to the main configuration file and treated like it is a new URL request.


(This is the cause of an infinite loop, and with a regular expression and variables, it is sometimes easy to miss. The following examples show very simply how it happens... there are cases where two or three or more rules write to each other and have the same effect.)


Pretend someone wanted to go to your site and a visit a page called 'letters.html', but you wanted to redirect them somewhere else like 'numbers.html':

They request the URL:

http://yoursite.com/letters.html



Your rule catches their URL request:

RewriteRule ^([a-z]+)\.html$ /numbers.html [R,NC,L]



Your rule then rewrites their request to the URL:

http://yoursite.com/numbers.html



Their request starts over like it is a new request for the URL:

http://yoursite.com/numbers.html



Your rule catches their URL request, because you are using a regular expression that catches all letters:

RewriteRule ^([a-z]+)\.html$ /numbers.html [R,NC,L]



Your rule then rewrites their request to the URL:

http://yoursite.com/numbers.html



Their URL request starts over like it is a new request for:

http://yoursite.com/numbers.html


Eventually they see 500-server error, or maximum redirects exceeded, or your server melts and your hosting company calls you and wants to know, why? (Kidding about the server melting, but rewrite rules can be placed in the server set-up and force a restart, to break an infinite loop.)


Keep in mind, this did not happen, because your rule didn't work... It worked too well, over and over and over...
Mad Mod_Rewriter
 
Posts: 26
Joined: Thu Jan 12, 2006 1:53 pm
Location: The Office

Return to The Basics of Mod_Rewrite

Who is online

Users browsing this forum: No registered users and 2 guests

cron