Redirecting from subfolder to another subfolder

New to mod_rewrite? This is a good place to start.

Redirecting from subfolder to another subfolder

Postby jshorb » Wed Jul 29, 2009 8:46 am

Most people want to have redirects from root to a subfolder, or a subfolder to a subdomain or root... I have the seemingly unique problem of wanting to redirect from one subfolder to another one. Here is the problem:

I am working on building a mirrored cached site which has all of my static content and then redirects to the dynamic live site when a page is requested that must be dynamic or which must be created (in order to be static later).

The URLs should all look like: http://host.edu/site/x/y/z.html
The cache is located in the %{DOCUMENT_ROOT}/site/ folder with subfolders having files such as: /site/x/y/z.html.gz
The dynamic site is located at %{DOCUMENT_ROOT}/siteDyn/ hosting Joomla with the contentstatic plugin creating cached copies of files in the /site/ folder.

The /site/ folder access is governed by a .htaccess file which has the following directives:

Code: Select all
Options +FollowSymLinks
RewriteEngine On
RewriteRule ^$ /siteDyn [L]
RewriteRule ^index\.html$ /siteDyn/ [L]
RewriteCond %{DOCUMENT_ROOT}/site/%{REQUEST_URI}.gz !-f
RewriteRule ^/site(.+) /siteDyn$1 [L]



The problem I'm finding is that frequently the entire URI is being passed as an argument to /siteDyn, instead of swapping out the 'site' for 'siteDyn' in the URL. I think this has to do with whether or not the RewriteRule is defaulting to full url or just the request.

Any suggestions and or requests for information?
Thanks,
Justin

PS - this site has a bounty of great resources. Much of the stuff I found through wading into the mod_rewrite apache documentation can be found here in much easier to read format, nice work!
jshorb
 
Posts: 5
Joined: Wed Jul 29, 2009 8:30 am
Location: Madison, WI

Postby richardk » Wed Jul 29, 2009 4:00 pm

The first two rules work, right? If so, then
Code: Select all
^/site(.+)

is unlikely to match as it has a / before site (and there isn't one before index). Or it could be the other way round.

For /site/x/y/z.html
Code: Select all
%{DOCUMENT_ROOT}/site/%{REQUEST_URI}.gz

would produce /document/root/site//site/x/y/z.html.gz

Try
Code: Select all
Options +FollowSymLinks

RewriteEngine On

RewriteRule ^(index\.html|site/?)?$ /siteDyn/ [L]

RewriteCond %{SCRIPT_FILENAME}.gz !-f
RewriteRule ^site(/.+)$ /siteDyn$1 [QSA,L]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

What seemed to work

Postby jshorb » Thu Jul 30, 2009 8:52 am

I ended up getting errors doing it your way, but I managed to attack the forums a bit and come up with something that works. It's sort of strange... but this seemed to do the trick as best as I can tell - and it takes care of trailing slash problems with the cache directory.

Cache Directory .htaccess
Code: Select all
Options+FollowSymLinks
RewriteEngine On

# Check to see if requested cached file exists, return it.
RewriteCond %{REQUEST_FILENAME} -f [NC]
RewriteRule ^(.*)$ - [L]
# End of cache check

# Check to see if this is requesting the homepage
RewriteCond %{REQUEST_FILENAME}/ -d
RewriteRule ^(.*)$ /siteDyn/ [E=REQUEST_URI:siteDyn/index.php,NE,L]
# End homepage check

# Redirect to  Dynamic site
RewriteRule ^(.*)$ /siteDyn/$1 [E=REQUEST_URI:siteDyn/$1,NE,L]
# End of redirection


And then, within the dynamic site folder's .htaccess file, I added this line (along with some CMS stuff which was unchanged):

Code: Select all
RewriteRule .* - [E=REQUEST_URI:%{REDIRECT_REQUEST_URI},NE]


This allowed my main site to have access to the correct Server variables... I just included this in the beginning of my CMS's core index.php page:

Code: Select all
// Sets a new 'REQUEST_URI' server variable if there has been a mod_rewrite from cache folder
$_SERVER['REQUEST_URI'] = isset($_SERVER['REDIRECT_REDIRECT_REQUEST_URI']) ? '/'.$_SERVER['REDIRECT_REDIRECT_REQUEST_URI'] : $_SERVER['REQUEST_URI'];


Hopefully that helps anyone with a similar problem. If anyone has a decent explanation of why this works... let me know. I think I get the REQUEST_URI redirection commands, although it seems like there should have been an easier way to do this.

Thanks a lot! I appreciated the suggestion, richardk. I ended up re-thinking a lot about how I did my rules because of that.
jshorb
 
Posts: 5
Joined: Wed Jul 29, 2009 8:30 am
Location: Madison, WI

Strange occurence

Postby jshorb » Thu Jul 30, 2009 2:49 pm

So the previous example works pretty well... but I ran across a strange occurrence. My dynamic site creates files as they are used. I have one file called:

Code: Select all
http://www.generic.edu/siteDyn/x.html


which gets stored into the cache once it is generated as:

Code: Select all
/site/x.html.gz


Then, a link from that page onto page y will have the URL and cached versions as follows:

Code: Select all
http://www.generic.edu/siteDyn/x/y.html
/site/x/y.html.gz


Keeping in mind that my above post's .htaccess file will rewrite http://www.generic.edu/site/x/y.html to http://www.generic.edu/siteDyn/x/y.html

The problem is that if I request http://www.generic.edu/site/x/y.html and there is a file already in cache that is called /site/x.html.gz the mod_rewrite conditional:

Code: Select all
RewriteCond %{REQUEST_FILENAME} -f [NC]
RewriteRule ^(.*)$ - [L]


Is evaluated as being a file (although one does not exist at /site/x/y.html.gz) and the .htaccess returns /site/x.html.gz instead! I find it strange that I do not need to include the ".gz" explicitly into my RewriteCond line... but if I include the .gz into this line, I get the REQUEST_URI to be: http://www.generic.edu/siteDyn/x.html.gz/y.html I can't really see why mod_rewrite would actually append the .html.gz onto the middle of the string given what I have provided above for .htaccess files.

I will state, however, that a brute force correction for this problem is a simple substitution within my CMS's code to correct for the internal REQUEST_URI before it is used is as follows. (as all requests get rerouted to index.php which then deciphers the SEF URL based on a script run from the REQUEST_URI).

Code: Select all
$_SERVER['REQUEST_URI'] = isset($_SERVER['REDIRECT_REDIRECT_REQUEST_URI']) ? str_replace('.html.gz','','/'.$_SERVER['REDIRECT_REDIRECT_REQUEST_URI']) : $_SERVER['REQUEST_URI'];


Any thoughts on why this is occurring or a better way to avoid this weird html.gz problem?

I hope this thread becomes useful to others!
Justin
jshorb
 
Posts: 5
Joined: Wed Jul 29, 2009 8:30 am
Location: Madison, WI

Postby richardk » Thu Jul 30, 2009 3:51 pm

Code: Select all
# Check to see if this is requesting the homepage
RewriteCond %{REQUEST_FILENAME}/ -d
RewriteRule ^(.*)$ /siteDyn/ [E=REQUEST_URI:siteDyn/index.php,NE,L]
# End homepage check

will match any directory, try
Code: Select all
# Check to see if this is requesting the homepage
RewriteRule ^$ /siteDyn/ [E=REQUEST_URI:siteDyn/index.php,NE,L]
# End homepage check


You could probably even shorten it to
Code: Select all
Options+FollowSymLinks

RewriteEngine On

# Check to see if this is requesting the homepage
RewriteRule ^$ /siteDyn/ [E=REQUEST_URI:siteDyn/index.php,NE,L]
# End homepage check

# Redirect to  Dynamic site
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^(.*)$ /siteDyn/$1 [E=REQUEST_URI:siteDyn/$1,NE,L]
# End of redirection


Also, why aren't you doing
Code: Select all
E=REQUEST_URI:/siteDyn/$1

? (With the / at the beginning.)

I find it strange that I do not need to include the ".gz" explicitly into my RewriteCond line...

It's probably MultiViews not mod_rewrite. Try adding
Code: Select all
Options -MultiViews


What's in the /.htaccess file?
What's in the /siteDyn/.htaccess file?
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby jshorb » Fri Jul 31, 2009 7:41 am

The 'matching any directory' thing isn't a huge deal, since nobody should be accessing any directory at all - so that's one reason why I put that line in the way it is. Unfortunately, if I use your option:

Code: Select all
# Check to see if this is requesting the homepage
RewriteRule ^$ /siteDyn/ [E=REQUEST_URI:siteDyn/index.php,NE,L]
# End homepage check


That matches only /site/ not without a trailing slash, so it won't redirect http://www.generic.edu/site to the homepage (although it works if there is a trailing slash). Since my method works for all possible directories, I put it that way. I did a lot of reading on how to add trailing slashes and tried putting an .htaccess into the root directory, but that didn't seem to work.

When I try your method, all I get is 404 errors. I did some checking around and one thing that you have in your posts is 'SCRIPT_FILENAME' which when I attempt to print that off using a variable dump always returns an empty string. That is one reason why I ended up going with REQUEST_FILENAME which actually had a value.

Your beginning slash suggestion was very, very obvious and I'm glad you pointed it out. It makes the code much smoother. The suggestion of turning of MultiViews also worked marvelously. Thanks!

Below you will find my current two .htaccess files as they stand.

/site/.htaccess
Code: Select all
Options +FollowSymLinks
Options -Multiviews
RewriteEngine On

# Check to see if requested cached file exists, return it.
RewriteCond %{REQUEST_FILENAME}.gz -f [NC]
RewriteRule ^(.*)$ - [L]
# End of cache check

# Check to see if this is requesting the homepage
RewriteCond %{REQUEST_FILENAME}/ -d
RewriteRule ^(.*)$ /siteDyn/ [E=REQUEST_URI:/siteDyn/index.php,NE,L]
# End homepage check

# Redirect to Joomla Dynamic site
RewriteRule ^(.*)$ /siteDyn/$1 [E=REQUEST_URI:/siteDyn/$1,NE,L]
# End of redirection


Full /siteDyn/.htaccess (I'm using Joomla!, so the end is straight from there)
Code: Select all
##  Can be commented out if causes errors, see notes above.
Options +FollowSymLinks

#
#  mod_rewrite in use

RewriteEngine On
###CHANGE ENV VARS IF REDIRECT
RewriteRule .* - [E=REQUEST_URI:%{REDIRECT_REQUEST_URI},NE]

########## Begin - Rewrite rules to block out some common exploits
## If you experience problems on your site block out the operations listed below
## This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to set a mosConfig value through the URL
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
RewriteRule ^(.*)$ index.php [F,L]
#
########## End - Rewrite rules to block out some common exploits

#  Uncomment following line if your webserver's URL
#  is not directly related to physical file paths.
#  Update Your Joomla! Directory (just / for root)

# RewriteBase /


########## Begin - Joomla! core SEF Section
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$  [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
#
########## End - Joomla! core SEF Section


Note that the line preceded by ###CHANGE ENV VARS IF REDIRECT is the only line that I have added myself. The rest is default Joomla! CMS.

Within the Joomla! index.php file, I have added the following:

Code: Select all
$_SERVER['REQUEST_URI']=isset($_SERVER['REDIRECT_REDIRECT_REQUEST_URI'])?$_SERVER['REDIRECT_REDIRECT_REQUEST_URI']:$_SERVER['REQUEST_URI'];


There's the whole story from beginning to end. I have nothing in my .htaccess file in my root directory. Still not sure why the shorthand code you gave for the homepage doesn't work, but things seem to be fairly lean at this point.

Justin
jshorb
 
Posts: 5
Joined: Wed Jul 29, 2009 8:30 am
Location: Madison, WI

Postby jshorb » Fri Jul 31, 2009 8:24 am

My mistake:

When double-checking to be sure that cached pages were actually cached pages, I found that using the Option -MultiViews, the RewriteCond checking for a filename failed (since it didn't find the correct filename and ended up being redirected back to the dynamic site instead of the cache). The solution is as follows:

Code: Select all
# Check to see if requested cached file exists, return it.
RewriteCond %{REQUEST_FILENAME}.gz -f [NC]
RewriteRule ^(.*)$ %{REQUEST_FILENAME}.gz [L]
RewriteCond %{REQUEST_FILENAME} -f [NC]
RewriteRule ^(.*)$ - [L]
# End of cache check


I tried something of the form:

[code[RewriteCond %{REQUEST_FILENAME}(.gz)? -f [NC]
RewriteRule ^(.*)$ %{REQUEST_FILENAME}%1 [L][/code]

but that wouldn't work. Things seem to be working nicely with the new set of rules. Thanks!
jshorb
 
Posts: 5
Joined: Wed Jul 29, 2009 8:30 am
Location: Madison, WI


Return to Beginner's Corner

Who is online

Users browsing this forum: No registered users and 30 guests

cron