Rewriting %3F into ?

Discuss practical ways rearrange URLs using mod_rewrite.

Rewriting %3F into ?

Postby jbbaxx » Thu Jun 07, 2007 2:38 pm

We have inbound links crashing due to encryption issues. These are links we have submitted to a portal, and they have variables that we use for tracking. The question mark that precedes the variables, in some cases, aren't being decoded. The links are using %3F instead of a question mark:

Code: Select all
http://www.mysite.com/index.php?var=1 //what it should be
http://www.mysite.com/index.php%3Fvar=1 //what it is


Depending on the link, they're either 404ing or 401ing, neither of which are good.

There are 1 of 2 things I'd like to do, both using htaccess:
1. Replace the %3F with a ?, so at least the link will go through
or
2. Crop the link at the %, so that http://www.mysite.com/index.php%3Fvar=1 will become http://www.mysite.com/index.php

Either solution would be satisfactory.

For #1, I've tried a simple RewriteRule, but mod_rewrite treats % differently, even if escaped.

I've come close with #2. Here's a code that works with %20s:
Code: Select all
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^%]+)\%
RewriteRule .* http://www.mysite.com/%1 [R=301,L]

but I can't get it working with %3Fs.

I've scoured Google and Yahoo, but this seems to be a unique problem. Any help would be greatly appreciated. And a solution would earn someone a virtual pint.
jbbaxx
 
Posts: 4
Joined: Wed Feb 14, 2007 7:21 am

Postby richardk » Fri Jun 08, 2007 2:44 pm

By the time the %3F gets into the RewriteRule it's probably been decoded, try
Code: Select all
Options +FollowSymLinks

RewriteEngine On

RewriteRule ^index\.php\?(.*)$ /index.php?$1 [R=301,L]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby jbbaxx » Mon Jun 11, 2007 8:26 am

Thanks, but no luck. This is apparently a bug in Apache that has yet to be handled. And could turn into a much bigger problem:

http://www.mail-archive.com/dev@httpd.a ... 36710.html (talks about the bug)
viewtopic.php?=&p=10878 (solution doesn’t work, but looks like it should)
http://www.webmasterworld.com/forum92/3174.htm (this one comes close; crops links that contain a %20)
http://fgiasson.com/blog/index.php/cate ... ogramming/ (this guy talks about a similar bug)

This bug may become a major problem soon. Yahoo has updated their feed system that places links within a URL. Their new method double-encodes the url:

starts with mysite.com/index.php?variable
then it encodes the question mark mysite.com/index.php%3Fvariable
finally it encodes the percent mysite.com/index.php%253Fvariable

When Google stumbles upon these links, it only decodes once, still leaving a scrambled link. The single-decoded link throws up a 404 or 403, so Google logs it as such. Hopefully, Google will catch this, otherwise it could have lethal implications to a site's search engine placement.

I'm at my wits-end looking for a solution. I feel like I'm caught in the crossfire of two giants...
jbbaxx
 
Posts: 4
Joined: Wed Feb 14, 2007 7:21 am

Postby richardk » Mon Jun 11, 2007 9:05 am

That bug isn't really affecting you though, because your problem is that you can't match the ?, not that the resultant URL is wrong.

Try
Code: Select all
Options +FollowSymLinks

RewriteEngine On

RewriteRule ^index\.php%3f(.*)$ /index.php?$1 [NC,R=301,L]

Also try adding a \ before %3f.

Or
Code: Select all
Options +FollowSymLinks

RewriteEngine On

RewriteCond %{THE_REQUEST} \ /index\.php%3f([^\ ]*)\  [NC]
RewriteRule ^index\.php.+$ /index.php?%1 [NC,R=301,L]


I think the reason why the ?, + and # aren't encoded is that they are special characters and there would be the problem of people wanting them not to be encoded. They could not decode the URL, but then it would be hard to match a space %20 in a character class, etc

The single-decoded link throws up a 404 or 403

At the very least you can create custom 403 and 404 error pages that can do the redirect instead.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby jbbaxx » Mon Jun 11, 2007 10:37 am

Thanks, but no luck.

My research shows the answer will involve using the THE_REQUEST server variable. Because the percent sign appears unescaped and precedes the initial question mark in the url (or in this case, should be the question mark), the ErrorDocument is called before the mod_rewrite. Therefore the client is sent to the error page before htaccess can say anything about it. But I haven't found a solution with THE_REQUEST, yet. There is a solution for cropping links at a %20:
Code: Select all
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^%]+)\%
RewriteRule .* http://www.example.com/%1 [R=301,L]

The %3F is more elusive.

Using custom error pages won't really work for this problem. The clients that are encountering these encoded links are automated, and don't care if they get redirected to the appropriate page:
A solution that might work is to use a script for your 404 errordocument and let the script strip the bad characters and redirect. However, by this point, you've already returned a 404-Not Found response to the client, which may make this effort pointless.


At this point, I'm not interested in saving the variables. Is there a way using THE_REQUEST to chop the link at the first percent sign?
jbbaxx
 
Posts: 4
Joined: Wed Feb 14, 2007 7:21 am

Postby richardk » Mon Jun 11, 2007 12:13 pm

You are almost certainly getting a 403 error. The error is caused because ? is a banned file/directory name character on Windows and Linux. This means when Apache attempts to find a file or directory named "/document/root/index.php?blah" (after decoding) and it causes a 403 error. This is before the .htaccess files are read so you cannot use mod_rewrite in the .htaccess file to override this 403 error or an ErrorDocument defined in the .htaccess file to catch this error.

The only way to catch %3f is to use mod_rewrite or an ErrorDocument in a <VirtualHost> (or the main server configuration if there aren't any <Virtualhost>s). You will have to ask your host if they can add mod_rewrite to your <VirtualHost>
Code: Select all
RewriteEngine On
RewriteRule ^(/.*)\?(.*)$ $1?$2 [R=301,L]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby jbbaxx » Mon Jun 11, 2007 12:22 pm

Thanks. I'll get with my System Admin and let you know it it works.
jbbaxx
 
Posts: 4
Joined: Wed Feb 14, 2007 7:21 am


Return to Friendly URLs with Mod_Rewrite

Who is online

Users browsing this forum: No registered users and 25 guests

cron