I have a problem I'm having trouble figuring out and was not able to find a similar problem here at the forum.
I want to be able to redirect a direct request to a pdf file from a search engine, e.g. Google to a html file with a link to the PDF file, in this way I can save some bandwidth from people directly downloading something they do not want to read. AS my pdf-files are on average 10 MB, this can add up.
I want to to this by rewriting e.g.
http://www.mysite.com/magazine/articles/articles.pdf
to
http://www.mysite.com/magazine/articles/
In this way, the index.html gets served instead of the large PDF.
Please notice that the directory structure is not fixed, the files can be in different folders at different levels. That is why I only want to look at the filename at the end.
Also, I also would like to test wheter the extension is a pdf before a rewrite, as this avoids the server carrying out unneccesary rewrites.
I also want to enable Google and other search engines to index the PDF directly, so I do not want the html page to be served to the search engine.
I think I got the Search engine part correct (Google in the example below, more to be added), but I am not ale to remove the complete filename, only the PDF extension, see below. Practically anything I try gives a server error.
How to test for the file extension I don't really know.
- Code: Select all
Options +FollowSymLinks
RewriteBase /
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} !googlebot [NC]
RewriteRule ^/?([a-z/]+)\.pdf$ $1/ [NC]
Your comments and suggestions are much appreciated.
