Blocking spider

Fix it!!

Blocking spider

Postby Pjotter » Mon Jun 19, 2006 11:31 pm

How do I block MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?

I tried this :

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} majestic12 [NC,OR]
RewriteRule .* /404.htm

But no results.

Mod_rewrite is active.
Pjotter
 
Posts: 4
Joined: Mon Jun 19, 2006 11:26 pm

Postby richardk » Tue Jun 20, 2006 4:02 am

This might be helpful: http://www.majestic12.co.uk/projects/ds ... p#BlockBot

Or this:
Code: Select all
Options +FollowSymLinks

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (MJ12bot|majestic12) [NC]
RewriteRule .* - [F]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Pjotter » Tue Jun 20, 2006 6:47 am

Thanks, I use both
Pjotter
 
Posts: 4
Joined: Mon Jun 19, 2006 11:26 pm

Postby richardk » Tue Jun 20, 2006 9:03 am

Don't use the mod_rewrite code unless the robots.txt file does not work. The robots.txt file will almost certainly work, the information is from the spider's website. The mod_rewrite code will block the Spiders access to the robots.txt file, so if you have both the robots.txt will be ignored.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Pjotter » Wed Jun 21, 2006 6:43 am

You're right. Didn't think of that.

Any idea why my rule didn't work.
Pjotter
 
Posts: 4
Joined: Mon Jun 19, 2006 11:26 pm

Postby richardk » Wed Jun 21, 2006 6:55 am

I would have thought your rule would have caused an infinite loop, because once the bot was sent to /404.htm (as a new internal subrequest), it would match the rule again because it goes through mod_rewrite again (because it's a new subrequest), then it would be sent to /404.htm, and match again... It might have worked if you had done something like this:
Code: Select all
# if it's not a reequest for /404.htm, send the bot to /404.htm
RewriteRule !^404\.htm$ /404.htm [L]

But it should have still blocked the bot with a 500 error after internally redirecting the maximum number of times.

It may also be that your server hasn't got FollowSymLinks on normally, and you needed to turn it on with:
Code: Select all
Options +FollowSymLinks


Or something else.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby Pjotter » Wed Jun 21, 2006 7:47 am

My rule didn't block anything. I thought the rule was wrong, but I forgot the Options +FollowSymLinks

I have disallowed the bot now in robots.txt. I hope this helps, because several are using this bot and not everyone plays by the rules.
Pjotter
 
Posts: 4
Joined: Mon Jun 19, 2006 11:26 pm


Return to Security with Mod_Rewrite

Who is online

Users browsing this forum: No registered users and 22 guests

cron