Redirect all initial requests

New to mod_rewrite? This is a good place to start.

Redirect all initial requests

Postby alhermette » Mon Mar 16, 2009 3:10 pm

I have a script that I want to use for tracking purposes. What I would like to do is send all initial requests for pages (html, htm, php etc.) to my tracking script which will record details contained in the query string before redirecting them to the page that they initially requested. Subsequent requests from the same user (eg when clicking an internal link) should not be rewritten to avoid excessive processing.

Is this even possible and what would be the best way of going about it (sorry but mod_rewrite is totally new to me). I was thinking of using the HTTP referer to determine if it is the users initial request or not something like this:

# Do not rewrite these file types
RewriteRule \.(txt|gif|jpeg|jpg|png|css|ico|xml|xsl|pdf)$ - [L]

#Rewrite to tracking script
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain\.com/ [NC]
RewriteRule ^/?(.*)$ /tracking-script.php?url=$1 [NC, QSA]

I need to append the original query string to the rewritten url so that the tracking script has some data to work on and also need to add the originally requested url so that the script can redirect to the correct page.

Any help much appreciated.
alhermette
 
Posts: 18
Joined: Mon Mar 16, 2009 2:44 pm

Postby richardk » Tue Mar 17, 2009 2:21 am

You will have problems. The Referer header is unrelaible (can be removed, set to a constant or random value). You could set a cookie but then the user would have to have cookies enabled.

Does it matter if a some user's go through the script on every request?

Only Referer
Code: Select all
Otpions +FollowSymLinks

RewriteEngine On

RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com(/.*)?$ [NC]
RewriteRule !.\.(txt|gif|jpeg|jpg|png|css|ico|xml|xsl|pdf)$ /tracking-script.php [NC,L]


Only cookie
Code: Select all
Otpions +FollowSymLinks

RewriteEngine On

RewriteCond %{HTTP_COOKIE} !^(.*;\ )?NAME=VALUE(;\ .*)?$ [NC]
RewriteRule !.\.(txt|gif|jpeg|jpg|png|css|ico|xml|xsl|pdf)$ /tracking-script.php [NC,L]


Combined
Code: Select all
Otpions +FollowSymLinks

RewriteEngine On

RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com(/.*)?$ [NC]
RewriteCond %{HTTP_COOKIE} !^(.*;\ )?NAME=VALUE(;\ .*)?$ [NC]
RewriteRule !.\.(txt|gif|jpeg|jpg|png|css|ico|xml|xsl|pdf)$ /tracking-script.php [NC,L]


I need to append the original query string to the rewritten url so that the tracking script has some data to work on and also need to add the originally requested url so that the script can redirect to the correct page.

They should be available in $_SERVER['REQUEST_URI'] and $_SERVER['QUERY_STRING'].
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby alhermette » Thu Mar 19, 2009 12:27 am

Thanks for the reply.

You will have problems. The Referer header is unrelaible (can be removed, set to a constant or random value). You could set a cookie but then the user would have to have cookies enabled.


I appreciate that it may not be as simple to do as it appears. I was just thinking that if the referer is set to my domain then it is pretty sure that it is not the users first request - I couldn't think of any other way of doing it but I'm sure that many exist.

Does it matter if a some user's go through the script on every request?


Not really, all I am trying to do is avoid unecessary processing and redirects. Once the user has passed through the tracking script once there is no value in doing so a second time so I was trying to avoid it as far as possible. I like the idea of setting a cookie and using the combination, seems like it should work reasonably well the majority of the time.

I thought that I needed [QSA] to pass the query string, am I wrong? Also I have heard of occasional problems with $_SERVER['REQUEST_URI'] not returning what was expected and would like to be able to append the originally requested url (eg by adding "&url=some-folder/whatever-page.php") to the query string directly. At least that way I can run some tests and see which way works out best for me.

Your help is much appreciated,

Allan
alhermette
 
Posts: 18
Joined: Mon Mar 16, 2009 2:44 pm

Postby richardk » Thu Mar 19, 2009 10:31 am

Not really, all I am trying to do is avoid unecessary processing and redirects.

But you won't be able to redircet after the tracking script (eg. HTTP 302) because then it would go through the tracking script and loop.

I thought that I needed [QSA] to pass the query string, am I wrong?/quote]
No (you are not wrong).

Also I have heard of occasional problems with $_SERVER['REQUEST_URI'] not returning what was expected and would like to be able to append the originally requested url (eg by adding "&url=some-folder/whatever-page.php") to the query string directly.

That would change between hosts not between requests. But you can add it back if you want. There are also other variables (REDIRECT_URL) that might be more reliable.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby alhermette » Mon Mar 23, 2009 1:22 am

OK here's how I propose to stop the loop created by the redirect in thr tracking script.

The ideal solution would be to use a session variable - am I correct in saying that .htaccess can't use session variables though?

As a second option I can append "&TRID=1" to all redirects from the tracking script and then use a condition to eliminate rewriting if this match is found. It won't stop multiple requests from being rewritten to the tracking script but it should break the loop each time.

I then set a cookie if the rewrite conditions are all met which should prevent rewriting from happening on subsequent occasions. I think I am right in saying that the default life of a cookie will be the current session - which is exactly what I need - is that correct?

Now for subsequent requests to go through the tracking script they would have to not declare mysite.com in the HTTP referer (should be rare) and have cookies disabled. If they do end up going through the script again I will catch them using the appended variable to break the loop.

I am also not rewriting requests to certain file types (eg. images). The final rewrite should then append the originally requested url to the query string along with any other variables.

Is this code correct for what I am trying to achieve?

Otpions +FollowSymLinks

RewriteEngine On

RewriteRule \.(txt|gif|jpeg|jpg|png|css|ico|xml|xsl|pdf)$ - [L] #Do not rewrite requests for files of specified type
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain\.com(/.*)?$ [NC] #Do not rewrite requests from this domain
RewriteCond %{HTTP_COOKIE} !^(.*;\ )?TRACKING=DONE(;\ .*)?$ [NC] #Do not rewrite if tracking cookie is set
RewriteCond %{QUERY_STRING} !^(.* )TRID=1(.*)$ [NC] #Do not rewrite if querystring contains TRID=1
RewriteRule ^(.*)$ /tracking-script.php?url=$1 [NC,L,QSA,CO=TRACKING=DONE:.mydomain.com]


Thanks for the help,

Allan
alhermette
 
Posts: 18
Joined: Mon Mar 16, 2009 2:44 pm

Postby richardk » Tue Mar 24, 2009 2:41 pm

The ideal solution would be to use a session variable - am I correct in saying that .htaccess can't use session variables though?

It cannot (or at least not without a RewriteMap and a script).

As a second option I can append "&TRID=1" to all redirects from the tracking script and then use a condition to eliminate rewriting if this match is found. It won't stop multiple requests from being rewritten to the tracking script but it should break the loop each time.

That should work.

I then set a cookie if the rewrite conditions are all met which should prevent rewriting from happening on subsequent occasions. I think I am right in saying that the default life of a cookie will be the current session - which is exactly what I need - is that correct?

Yes. Do you have Apache 2.0.40+? If not you will need to set the cookie with PHP.

A few small changes (query string matching and setting the cookie)
Code: Select all
Otpions +FollowSymLinks

RewriteEngine On

RewriteRule \.(txt|gif|jpe?g|png|css|ico|xml|xsl|pdf)$ - [L]

RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com(/.*)?$
RewriteCond %{HTTP_COOKIE} !^(.*;\ )?TRACKING=DONE(;\ .*)?$ [NC]
RewriteCond %{QUERY_STRING} !^(.*&)?TRID=1(&.*)?$ [NC]
RewriteRule ^(.*)$ /tracking-script.php?url=$1 [NC,QSA,CO=TRACKING:DONE:.example.com,L]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby alhermette » Wed Mar 25, 2009 8:34 am

Thanks for your help, it is greatly appreciated.

I installed the code but it is throwing a 500 internal server error. Now I have an application that runs in a subfolder that uses its own .htaccess file to utilise mod_rewrite so I know that it is functioning on the server in general. My hosting company also confirmed to me that it is enabled. Apache version is 2.2.9

Here's the error that I am getting:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, webmaster@example.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.
Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.7a mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.6 Server at www.example.com Port 80


Here is my .htaccess file's contents in full (the only code that I have added is what you gave me).

# -FrontPage-

IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName example.com
AuthUserFile /home/xyz/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/xyz/public_html/_vti_pvt/service.grp

Otpions +FollowSymLinks

RewriteEngine On

RewriteRule \.(txt|gif|jpe?g|png|css|ico|xml|xsl|pdf)$ - [L]

RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com(/.*)?$
RewriteCond %{HTTP_COOKIE} !^(.*;\ )?TRACKING=DONE(;\ .*)?$ [NC]
RewriteCond %{QUERY_STRING} !^(.*&)?TRID=1(&.*)?$ [NC]
RewriteRule ^(.*)$ /tracking-script.php?url=$1 [NC,QSA,CO=TRACKING:DONE:.example.com,L]


In order to access the site at all I have to comment out "Otpions +FollowSymLinks" and all the lines pertaining to the rewrite rule to the tracking script so it looks like this:

#Otpions +FollowSymLinks

RewriteEngine On

RewriteRule \.(txt|gif|jpe?g|png|css|ico|xml|xsl|pdf)$ - [L]

#RewriteCond %{HTTP_REFERER} !^http://(www\.)?cool-offer\.com(/.*)?$
#RewriteCond %{HTTP_COOKIE} !^(.*;\ )?TRACKING=DONE(;\ .*)?$ [NC]
#RewriteCond %{QUERY_STRING} !^(.*&)?TRID=1(&.*)?$ [NC]
#RewriteRule ^(.*)$ /tracking-script.php?url=$1 [NC,QSA,CO=TRACKING:DONE:.example.com,L]


If I do that it works again - but kind of destroys the whole point! Any idea what could be the cause of this, I'm sure it's probably something simple but I have no idea where to start.
alhermette
 
Posts: 18
Joined: Mon Mar 16, 2009 2:44 pm

Postby richardk » Wed Mar 25, 2009 3:48 pm

In order to access the site at all I have to comment out "Otpions +FollowSymLinks"

Typo. It should be Options.

and all the lines pertaining to the rewrite rule to the tracking script so it looks like this:

Try replacing
Code: Select all
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com(/.*)?$

with
Code: Select all
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com(/.*)?$ [NC]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby alhermette » Mon Mar 30, 2009 12:33 am

Thank you - that works perfectly!

I have been running some tests over the weekend and have found one issue that I need to address. In tracking-script.php I am appending some variables depending on several conditions. One of these for example is "TRID=1" in order to stop looping back through the tracking script.

As all initial requests go through the tracking script they will all end up on a page that has a query string containing "TRID=1". Whilst that is fine for most users my pages will be indexed by the search engines with this query string as well. If that happens anyone clicking on a search engine link will bypass the tracking page due to the "TRID=1" in the query string which destroys the point of having the tracking script.

I have devoted some thought to it and have come up with 2 possible solutions. The first solution would be to strip the entire query string if it contains any variables that have been specifically added by the tracking script ("TRID=1" or "DFID=*anything*"). Would this be the correct way to do that:

RewriteCond %{QUERY_STRING} !^(.*&)?TRID=1(&.*)?$ [NC]
RewriteRule ^(.*)$ $1? [NC]
RewriteCond %{QUERY_STRING} !^(.*&)?DFID=(.*)?$ [NC]
RewriteRule ^(.*)$ $1? [NC,L]


The second option which I think would be the way I am erring towards would be to strip out only the specific variables that the tracing script has inserted leaving the rest of the querystring intact and rewrite the result. I have no idea how I would need to modify the above example to strip out just one variable. I assume that I would also need a third condition that would remove the "?" in the case where the resulting querystring was empty.

Could you also confirm that this condition:

RewriteCond %{QUERY_STRING} !^(.*&)?TRID=1(&.*)?$ [NC]


will still match if "TRID=1" is the only variable in the query string which would often be the case.

Thanks for your help,

Allan
alhermette
 
Posts: 18
Joined: Mon Mar 16, 2009 2:44 pm

Postby richardk » Mon Mar 30, 2009 6:32 am

If you remove the variable then it will loop through the tracking script.

I don't see any solution that won't introduce more problems. It might be better if you have your tracking script output the file instead of redirecting. Or use the Apache logs that are available.

Do you have access to the httpd.conf file?
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Next

Return to Beginner's Corner

Who is online

Users browsing this forum: No registered users and 29 guests

cron