Make rule not match URLs ending with more than 1 slash

Discuss practical ways rearrange URLs using mod_rewrite.

Make rule not match URLs ending with more than 1 slash

Postby Forlix » Thu Oct 09, 2008 6:22 am

Hey guys!

I have an image gallery website, which i just recently upgraded to mod_rewrite, and now seeing a great increase in Googlebot&co activity :D

The links on my site look like this:

Display an image

/idsp/2008090519241505
actual: /scripts/idsp.cgi?id=2008090519241505

Display thumbnail gallery

/thmb/00/1700
actual: /scripts/thmb.cgi?ct=00;lp=1700;

ct = category, lp = list position (linked to page)

Using this in htaccess:

RewriteEngine On

RewriteCond %{REQUEST_URI} ^/idsp [NC]
RewriteRule ^idsp/?([0-9]+)?/?$ /scripts/idsp.cgi?id=$1; [NC,L]

RewriteCond %{REQUEST_URI} ^/thmb [NC]
RewriteRule ^thmb/?([0-9]+)?/?([0-9]+)?/?$ /scripts/thmb.cgi?ct=$1;lp=$2; [NC,L]

Now this works pretty nicely, it also matches /thmb/00 and /thmb only, in which case the script uses defaults for category and page.

The issue i have is that you can add as many slashes to the end as you like, and it still matches. This of course moves the generated page deeper and deeper into the folder tree, and me being a friend of relative urls, having implemented in my scripts a function to adapt them automatically so images and links work flawlessly no matter how many backslashes or in which folder depth the page is displayed, now find that when you add alot of these slashes, the relative paths in the generated page will be flooded with ../../../../../../ which is not nice and also not very secure as i think.

I tried to inhibit this matching with a lot of rules,but none works.

RewriteRule ^test//+$ http://mysite.com/test [R,NC,L]
-> should match whenever there is more than 1 slash at the end of the url, and redirect. But it doesnt work.
The rules in the top should not match more than 1 slash in the end in the first place, since its /?, meaning zero or ONE occurence... this also works fine with say letters, but not with the god damn slash. I also tested it in a regex tool and it matches correctly, so it must be Apache related.
I read something here or elsewhere that the server might be merging the slashes at some level... not sure about that.

Using Apache 2.2.9

Can anyone shed some light onto this? Thanks!
Forlix
 
Posts: 6
Joined: Thu Oct 09, 2008 5:24 am
Location: Hamburg, Germany

Postby Forlix » Thu Oct 09, 2008 8:12 am

Ok i updated it, the rules from first post matched some unwanted cases.

RewriteCond %{REQUEST_URI} ^/idsp [NC]
RewriteRule ^idsp/([0-9]+)/?$ /scripts/idsp.cgi?id=$1; [NC,L]
RewriteRule ^idsp/?$ /scripts/idsp.cgi [NC,L]

RewriteCond %{REQUEST_URI} ^/thmb [NC]
RewriteRule ^thmb/([0-9]+)/([0-9]+)/?$ /scripts/thmb.cgi?ct=$1;lp=$2; [NC,L]
RewriteRule ^thmb/([0-9]+)/?$ /scripts/thmb.cgi?ct=$1; [NC,L]
RewriteRule ^thmb/?$ /scripts/thmb.cgi [NC,L]

Is it even any good to add the rewritecond in front, performance wise?
Forlix
 
Posts: 6
Joined: Thu Oct 09, 2008 5:24 am
Location: Hamburg, Germany

Postby richardk » Thu Oct 09, 2008 11:41 am

The conditions are pointless, they do the same as the rule.
Code: Select all
Options +FollowSymLinks

RewriteEngine On

RewriteRule ^idsp/([0-9]+)/?$ /scripts/idsp.cgi?id=$1; [NC,L]
RewriteRule ^idsp/?$          /scripts/idsp.cgi        [NC,L]

RewriteRule ^thmb/([0-9]+)/([0-9]+)/?$ /scripts/thmb.cgi?ct=$1;lp=$2; [NC,L]
RewriteRule ^thmb/([0-9]+)/?$          /scripts/thmb.cgi?ct=$1;       [NC,L]
RewriteRule ^thmb/?$                   /scripts/thmb.cgi              [NC,L]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Forlix » Thu Oct 09, 2008 12:23 pm

Thanks.
I found some methods that actually work for removing excessive slashes:

http://www.webmaster-talk.com/website-a ... iency.html

But they are extremely inefficient, killing off the slashes one by one with a redirect each time.
I guess i will have to handle them within my scripts.
Forlix
 
Posts: 6
Joined: Thu Oct 09, 2008 5:24 am
Location: Hamburg, Germany

Postby richardk » Thu Oct 09, 2008 2:43 pm

This should remove up to 8 (the first 8 positions) sets of two or more slashes in one redirect
Code: Select all
RewriteCond %{REQUEST_URI} //
RewriteCond %{REQUEST_URI} ^/*(?:(/[^/]+)/*(?:(/[^/]+)/*(?:(/[^/]+)/*(?:(/[^/]+)/*(?:(/[^/]+)/*(?:(/[^/]+)/*(?:(/[^/]+)/*(?:(/[^/]+))?)?)?)?)?)?)?)?(/?)/*$
RewriteRule .* %1%2%3%4%5%6%7%8%9 [R=301,L]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Forlix » Thu Oct 09, 2008 3:33 pm

That works great as far as trailing slashes are concerned, but i need to strip out inline as well, as they have the same effect on the relative links.

I made a solution in perl, which works for all cases:

Code: Select all
if ($ENV{'REQUEST_URI'} =~ /\/[\/]+/) {
$ENV{'REQUEST_URI'} =~ s/\/[\/]+/\//g;
print "Status: 301 Moved Permanently\n";
print "Location: http://www.mydomain.com$ENV{'REQUEST_URI'}\n\n";
exit; }


And just add that to the top of my scripts.
Forlix
 
Posts: 6
Joined: Thu Oct 09, 2008 5:24 am
Location: Hamburg, Germany

Postby richardk » Fri Oct 10, 2008 12:35 pm

That works great as far as trailing slashes are concerned, but i need to strip out inline as well, as they have the same effect on the relative links.

It should redirect /abc//def/ to /abc/def/. Or do you mean something else?
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Forlix » Fri Oct 10, 2008 10:03 pm

richardk wrote:It should redirect /abc//def/ to /abc/def/. Or do you mean something else?


No thats what i mean alright, but it doesnt work.
Forlix
 
Posts: 6
Joined: Thu Oct 09, 2008 5:24 am
Location: Hamburg, Germany

Postby richardk » Sat Oct 11, 2008 8:50 am

It does on my test server (2.2.4).
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Forlix » Sat Oct 11, 2008 11:35 am

richardk wrote:It does on my test server (2.2.4).


Ok i tested it again. Its working on:

http://domain.com/folder//////
http://domain.com///////folder
http://domain.com///////folder///////

but not on:

http://domain.com///////folder/
http://domain.com///////

But i guess it will do, as the multiple slashes directly behind the domain seem not to affect image loading etc. anyway.
Forlix
 
Posts: 6
Joined: Thu Oct 09, 2008 5:24 am
Location: Hamburg, Germany


Return to Friendly URLs with Mod_Rewrite

Who is online

Users browsing this forum: Google [Bot] and 6 guests

cron