I want to build a robot blocking solution that is a little more elegant than the simple examples shown everywhere on the web (that simply write an own RewriteCond for every Robot). I got a real huge list of Robots and don't want to write an own condition for every robot. I thought it would be nice (and has to be possible) to compare the user agent dynamically to the entries in the file and decide whether the request is a robot or not. If it is a robot it should be redirected to a special robots page. If not the requested page should be displayed. I want to write my rule to the httpd.conf file and not to a .htaccess file.
I am using Apache 1.3.27.
These are my present attempts:
- Code: Select all
<Directory />
...
RewriteEngine On
RewriteBase /
RewriteMap robots prg:D:\test.pl
RewriteRule ^.*$ ${robots:%{USER_AGENT}:-}
</Directory>
(- means no replacement ?!?)
The Perl Script looks like this:
- Code: Select all
#!d:\xyz\perl.exe
$| = 1;
my %robots;
open(ROBOTS,"<d:\\xyz\\robots.txt");
open(TEST,">c:\\test.txt");
print TEST "Starting";
close(TEST);
while(<ROBOTS>)
{
chomp($_);
$robots{$_} = "/robots";
}
close(ROBOTS);
while(<STDIN>)
{
chomp($_);
open(TEST,">c:\\test.txt");
print TEST $_;
if(exists $robots{$_})
{
print $robots{$_}."\n";
print TEST ":".$robots{$_};
}
else
{
print TEST "NULL\n";
print "NULL\n";
}
close(TEST);
}
The problem with this attempt is that the script is not called (no debugging output is generated (no test.txt file on c)).
The second problem is even the simple declaration of the rewriterule (RewriteRule ^.*$ /robots) in httpd.conf. This makes the server run an infinite loop (logically) because the rule applies everytime again ...
Does anybody have an idea how to solve my problem?
I would be very very grateful ...
Thx
Bye
Angel
PS: Sorry for my bad english ... i am not a native speaker, as you might have mentioned ...