Plus sign in MediaWiki URLs

Using a single web hosting account to host multiple sites

Plus sign in MediaWiki URLs

Postby titaniumdecoy » Wed Aug 30, 2006 5:09 pm

For days I have been trying to figure out how to allow plus symbols (+) in the URLs of my MediaWiki installation. I have found the following in the DefaultSettings.php file; a line that defines the legal title characters.
Code: Select all
/**
* Allowed title characters -- regex character class
* Don't change this unless you know what you're doing
*
* Problematic punctuation:
*  []{}|#    Are needed for link syntax, never enable these
*  %         Enabled by default, minor problems with path to query rewrite rules, see below
*  +         Doesn't work with path to query rewrite rules, corrupted by apache
*  ?         Enabled by default, but doesn't work with path to PATH_INFO rewrites
*
* All three of these punctuation problems can be avoided by using an alias, instead of a
* rewrite rule of either variety.
*
* The problem with % is that when using a path to query rewrite rule, URLs are
* double-unescaped: once by Apache's path conversion code, and again by PHP. So
* %253F, for example, becomes "?". Our code does not double-escape to compensate
* for this, indeed double escaping would break if the double-escaped title was
* passed in the query string rather than the path. This is a minor security issue
* because articles can be created such that they are hard to view or edit.
*
* Theoretically 0x80-0x9F of ISO 8859-1 should be disallowed, but
* this breaks interlanguage links
*/
$wgLegalTitleChars = " %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\\x80-\\xFF";

I have tried adding both + and \+ and neither works. The comment states that the + symbol "Doesn't work with path to query rewrite rules, corrupted by apache". If anyone could give me a hint, I would be very grateful.

I know this is possible because Wikipedia allows + characters in URLS; for example, http://en.wikipedia.org/wiki/C++ .
titaniumdecoy
 
Posts: 23
Joined: Wed Sep 14, 2005 11:24 am

Postby seomike » Fri Sep 01, 2006 7:06 am

How about posting a few example rewriterules that you are using.

Then maybe we can see whats going on.
seomike
 
Posts: 331
Joined: Thu May 06, 2004 7:36 pm
Location: Dallas

Postby titaniumdecoy » Fri Sep 01, 2006 8:31 am

Here is my .htaccess file:
Code: Select all
Options +FollowSymLinks
RewriteEngine On

RewriteRule ^$ index.php?title=Main [QSA,L]

RewriteCond %{HTTP_HOST} !ext\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^(.*)$ http://ext.com/$1 [R=permanent,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?title=$1 [L,QSA]
Last edited by titaniumdecoy on Fri Sep 01, 2006 9:23 am, edited 1 time in total.
titaniumdecoy
 
Posts: 23
Joined: Wed Sep 14, 2005 11:24 am

Postby titaniumdecoy » Fri Sep 01, 2006 9:21 am

I think I know what to do, but I need help.

Using the .htaccess file posted above, I need http://ext.com/C++ to be internally rewritten to http://ext.com/index.php?title=C%2B%2B .

How can I encode + as %2B in the internally rewritten URL? Thanks.

I got the idea from http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restrictions)#Plus
titaniumdecoy
 
Posts: 23
Joined: Wed Sep 14, 2005 11:24 am

Postby richardk » Fri Sep 01, 2006 11:20 am

Does MediaWiki output %2B? If it does:

Use this:
Code: Select all
Options +FollowSymLinks
RewriteEngine On

RewriteRule ^$ /index.php?title=Main [QSA,L]

RewriteCond %{HTTP_HOST} !^ext\.com$ [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^(.*)$ http://ext.com/$1 [R=301,L]

RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.*)$ /index.php [QSA,L]


And add this to the top of index.php, and mod_rewrite won't mess up the %2Bs:
Code: Select all
if(!isset($_GET['title']) && strpos(getenv('REQUEST_URI'), 'index.php') === false)
{
  $_GET['title'] = substr(getenv('REQUEST_URI'), 1);
}
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby titaniumdecoy » Fri Sep 01, 2006 11:59 am

I think this is very close to working, but for some reason it doesn't quite work yet.

I tried your code, and it didn't seem to make any difference. So I tried making the following changes (with the same result):

Code: Select all
if(strpos(getenv('REQUEST_URI'), 'index.php') === false)
{
   $_GET['title'] = urlencode(substr(getenv('REQUEST_URI'), 1));
}

I know that any + symbols in the title have been replaced with %2B after this call (because of the urlencode function). However, MediaWiki still strips the + symbols.

I don't know why this should be the case because the following works fine:

http://ext.com/index.php?title=C%2B%2B

This loads a page with the title C++.

What is going on???
titaniumdecoy
 
Posts: 23
Joined: Wed Sep 14, 2005 11:24 am

Postby richardk » Fri Sep 01, 2006 12:17 pm

The above will only work if %2B is in the links.

You shouldn't urlencode() it because other special characters should already be encoded. As long as + is never used as a space, you could use this:
Code: Select all
if(strpos(getenv('REQUEST_URI'), 'index.php') === false)
{
   $_GET['title'] = str_replace('+', '%2B', substr(getenv('REQUEST_URI'), 1));
}

Then MediaWiki should urldecode() it.

However, MediaWiki still strips the + symbols.

When? On input or on output?
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby titaniumdecoy » Fri Sep 01, 2006 12:33 pm

richardk wrote:
However, MediaWiki still strips the + symbols.

When? On input or on output?

When I type ext.com/C++ it is immediately redirected to ext.com/C (same with ext.com/C%2B%2B). This shouldn't happen because I added "+" to the $wgLegalTitleChars variable. So I can't figure out what the problem is.

After further consideration, I don't think the + symbols should need to be converted to %2B in the code. When I go to ext.com/index.php?title=C++, the $_GET['title'] variable contains + characters.
titaniumdecoy
 
Posts: 23
Joined: Wed Sep 14, 2005 11:24 am

Postby richardk » Fri Sep 01, 2006 12:51 pm

The last thing to try is "%252B" in the str_replace() instead of "%2B".

If that doesn't work, it has to be a problem with the WikiMedia code. Both %2B, %252B and + aren't working, one of the should. You'll have to ask someone who knows the code, or dive in and find out what happens to the $_GET['title'] variable and where it goes wrong. If wikimedia can do it, someone must know.

Edit: what does a link with a different special character look like?
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby titaniumdecoy » Fri Sep 01, 2006 12:56 pm

Sigh... that doesn't work. Nor does this code I tried:

Code: Select all
if(strpos(getenv('REQUEST_URI'), 'index.php') === false)
{
   $page_title = str_replace('+', '%2B', substr(getenv('REQUEST_URI'), 1));

   $_GET['title'] = $page_title;
   
   $_SERVER['REQUEST_URI'] = "/index.php?title=$page_title";
}

If I find anything, I'll get back to you. Thanks for your help.
titaniumdecoy
 
Posts: 23
Joined: Wed Sep 14, 2005 11:24 am

Next

Return to Domain Handling

Who is online

Users browsing this forum: No registered users and 18 guests

cron