Hide .html extension (redirect version with .html)

Discuss practical ways rearrange URLs using mod_rewrite.

Hide .html extension (redirect version with .html)

Postby hm78 » Tue Jun 02, 2009 5:04 am

In general

I want to hide the file extension .html.
And if the user enters /foobar.html, he shall be redirected to /foobar.

Special cases

If there is a file /foobar.html and a folder /foobar/ in the same directory,
the user shall get the file, if he enters /foobar, and he shall get the folder, if he enters /foobar/.

If there is a file /foobar.html and a file /foobar.pdf (or any other extension except .html) in the same directory,
the user shall get the HTML version, if he enters /foobar.

Examples

(red is the actual file/folder lying on the server, green shall be the only valid URL to access this file)
  • /sport.html --> /sport
  • /sport.htm --> /sport.htm
  • /cooking.pdf --> /cooking.pdf
  • /garden/ --> /garden/
  • /garden.html --> /garden
hm78
 
Posts: 17
Joined: Sun Mar 04, 2007 8:26 pm

Postby richardk » Tue Jun 02, 2009 10:25 am

If there is a file /foobar.html and a folder /foobar/ in the same directory,
the user shall get the file, if he enters /foobar, and he shall get the folder, if he enters /foobar/.

To do that you would have to turn Apache's default behaviour of adding trailing slashes to directories (mod_dir, DirectorySlash). Please read about it first.

Try
Code: Select all
Options +FollowSymLinks -MultiViews

<IfModule mod_rewrite>
  DirectorySlash Off

  RewriteEngine On

  RewriteCond %{SCRIPT_FILENAME}/ -d
  RewriteCond %{SCRIPT_FILENAME}.html !-f
  RewriteRule [^/]$ %{REQUEST_URI}/ [R=301,L]

  RewriteCond %{ENV:REDIRECT_STATUS} ^$
  RewriteRule ^(.+)\.html$ /$1 [R=301,L]

  RewriteCond %{SCRIPT_FILENAME}.html -f
  RewriteRule [^/]$ %{REQUEST_URI}.html [QSA,L]
</IfModule>


Edit: Forgot the quote text.
Last edited by richardk on Tue Jun 02, 2009 3:51 pm, edited 1 time in total.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby hm78 » Tue Jun 02, 2009 11:47 am

I have only access via .htaccess. So I bet I have to remove these "<IfModule mod_rewrite>" and "</IfModule>"?

I tried the following:

Code: Select all
Options +FollowSymLinks -MultiViews
DirectorySlash Off

RewriteEngine On

  RewriteCond %{SCRIPT_FILENAME}/ -d
  RewriteCond %{SCRIPT_FILENAME}.html !-f
  RewriteRule [^/]$ %{REQUEST_URI}/ [R=301,L]

  RewriteCond %{ENV:REDIRECT_STATUS} ^$
  RewriteRule ^(.+)\.html$ /$1 [R=301,L]

  RewriteCond %{SCRIPT_FILENAME}.html -f
  RewriteRule [^/]$ %{REQUEST_URI}.html [QSA,L]


And -- wow -- it works! This is great, thank you :D

Additions to my requirements:

/index.html becomes /index. And it would be cool, if this "index" could be stripped off, too (because it's only the name of the file Apache looks for in a folder at first, it offers no benefit to the user)

Questionable:

if /foobar (a file without any file extension) is lying in the same folder as /foobar.html, this htaccess code would get the HTML version, if you call /foobar. Now, the problem is, that you can never get the file /foobar. Might there be a solution for this? I think, I'll never upload files without file extension, but who knows ... it's not so important, though.

I try to understand, what the code does:


Please correct me!

DirectorySlash Off:
I read about it (don't understand all the English, though). As far as I understand, it disables the automatic addition of the trailing slash, when a name is called, that matches a folder name. But I wonder ... it still does add a trailing slash, hm ...

SCRIPT_FILENAME
= the name of the requested file

REQUEST_URI
= SCRIPT_FILENAME + potential query strings

----------------------------------------------------------------------------------------

RewriteCond %{SCRIPT_FILENAME}/ -d:
checks, if [the requested file + a trailing slash] IS a folder (d: directory). If it's not, stop here?

RewriteCond %{SCRIPT_FILENAME}.html !-f:
checks, if [the requested file + the extention .html] IS NOT a file (f: file). If it is a file, stop here?

RewriteRule [^/]$ %{REQUEST_URI}/ [R=301,L]:
if the two conditions are true, this RewriteRule gets executed:
checks, if the entered URL does NOT end with a trailing slash. If it's true, it adds a trailing slash to the URL (redirect 301) and stops here (L).

----------------------------------------------------------------------------------------

RewriteCond %{ENV:REDIRECT_STATUS} ^$:
I'm not sure. Does it mean: "if nothing got redirected until now" (so: if the conditions of the first block were not true)?

RewriteRule ^(.+)\.html$ /$1 [R=301,L]:
I don't get it

----------------------------------------------------------------------------------------

RewriteCond %{SCRIPT_FILENAME}.html -f:
checks, if [the requested file + .html] IS a file.

RewriteRule [^/]$ %{REQUEST_URI}.html [QSA,L]
I don't get it
hm78
 
Posts: 17
Joined: Sun Mar 04, 2007 8:26 pm

Postby richardk » Tue Jun 02, 2009 4:10 pm

I have only access via .htaccess. So I bet I have to remove these "<IfModule mod_rewrite>" and "</IfModule>"?

You do not and you should not. It's there to stop DirectorySlash from being processed if the mod_rewrite isn't.

/index.html becomes /index. And it would be cool, if this "index" could be stripped off, too (because it's only the name of the file Apache looks for in a folder at first, it offers no benefit to the user)

Try adding
Code: Select all
RewriteCond %{THE_REQUEST} \ /(.+/)?index\.html(\?.*)?\  [NC]
RewriteRule ^(.+/)?index\.html$ /%1 [R=301,L]

after
Code: Select all
RewriteEngine On


But I wonder ... it still does add a trailing slash, hm ...

The mod_rewrite adds some trailing slashes (if there isn't a HTML file).

SCRIPT_FILENAME
= the name of the requested file

It is the path to the file on the server. For example /home/username/public_html/directory/file.ext.

REQUEST_URI
= SCRIPT_FILENAME + potential query strings

It is the part of the URL after the domain name and before the query string. For example for http://www.example.com/directory/file.e ... ef&ghi=jkl

HTTP_HOST = www.example.com
REQUEST_URI = /directory/file.ext
QUERY_STRING = abc=def&ghi=jkl

RewriteCond %{ENV:REDIRECT_STATUS} ^$:
I'm not sure. Does it mean: "if nothing got redirected until now" (so: if the conditions of the first block were not true)?

REDIRECT_STATUS is empty for new requests, including requests after mod_rewrite redirects (R=301). It is not empty when mod_rewrite rewrites (an internal request). This is used to stop loops, eg. you request /file, it get rewritten to /file.html, this condition will stop the rule removing the .html extension.

Code: Select all
# Turn MultiViews off. (MultiViews on causes /abc to go to /abc.ext.)
Options +FollowSymLinks -MultiViews

<IfModule mod_rewrite>
  # Diable mod_dir adding missing trailing slashes to directory requests.
  DirectorySlash Off

  RewriteEngine On

  # If it's a request to index.html
  RewriteCond %{THE_REQUEST} \ /(.+/)?index\.html(\?.*)?\  [NC]
  # Remove it.
  RewriteRule ^(.+/)?index\.html$ /%1 [R=301,L]

  # Add missing trailing slashes to directories if a matching .html does not exist.
  # If it's a request to a directory.
  RewriteCond %{SCRIPT_FILENAME}/ -d
  # And a HTML file does not (!) exist.
  RewriteCond %{SCRIPT_FILENAME}.html !-f
  # And there is not trailing slash redirect to add it.
  RewriteRule [^/]$ %{REQUEST_URI}/ [R=301,L]

  # Remove HTML extensions.
  # If it's a request from a browser, not an internal request by Apache/mod_rewrite.
  RewriteCond %{ENV:REDIRECT_STATUS} ^$
  # And the request has a HTML extension. Redirect to remove it.
  RewriteRule ^(.+)\.html$ /$1 [R=301,L]

  # If the request exists with a .html extension.
  RewriteCond %{SCRIPT_FILENAME}.html -f
  # And there is no trailing slash, rewrite to add the .html extesion.
  RewriteRule [^/]$ %{REQUEST_URI}.html [QSA,L]
</IfModule>


if the two conditions are true, this RewriteRule gets executed:

The RewriteRule pattern is tested first. Then the RewriteConds in order. Then the RewriteRule substitution.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby hm78 » Tue Jun 02, 2009 5:11 pm

Wow, thanks for the explanations and the commented code :)

richardk wrote:You do not and you should not. It's there to stop DirectorySlash from being processed if the mod_rewrite isn't.


Oh, okay. I tried it with <IfModule mod_rewrite>, but the RewriteRules did not work anymore. Then I found a solution in a German board: use <IfModule mod_rewrite.c> instead. It seems to work.

richardk wrote:Try adding
Code: Select all
RewriteCond %{THE_REQUEST} \ /(.+/)?index\.html(\?.*)?\  [NC]
RewriteRule ^(.+/)?index\.html$ /%1 [R=301,L]



I added these two lines after RewriteEngine On.
If I enter /index.html, it becomes / (good!).
But if I enter /index, it stays /index (not good, want to remove "index", too).

Performance?
What do you think -- are these "easy" tasks for the server? Could/should I use it for all of my websites, or is it stressing the server (slowing down delivery for the visitors?), so that I should only use it, when it's really needed? At the moment, I think of adding this to my standard .htaccess for every site (consists atm only of www. to non-www. redirection)
hm78
 
Posts: 17
Joined: Sun Mar 04, 2007 8:26 pm

Postby richardk » Wed Jun 03, 2009 12:23 pm

But if I enter /index, it stays /index (not good, want to remove "index", too).

Replace both
Code: Select all
\.html

with
Code: Select all
(\.html)?


Performance?

I doubt you'll notice anything. You should change the links in your pages to not include the .html extension and then it will not have to redirect as often.

The real question is, does it having .html really matter? If you think it's worth it, do it.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby hm78 » Wed Jun 03, 2009 4:23 pm

It works.

Thanks again, you really do a great job :)
hm78
 
Posts: 17
Joined: Sun Mar 04, 2007 8:26 pm


Return to Friendly URLs with Mod_Rewrite

Who is online

Users browsing this forum: No registered users and 5 guests

cron