BOTs attack
The bots are now pulling huge CPU load making it impossible to browse the the site.
I need a solution to ban every useragent other than MSN,GOOGLE, YAHOO and couple of others. I dont care if they are valid or not just dont want them to take any of my resources.
robots.txt and .htaccess have both failed in limiting these bad bots.
Am told that there are scripts that run in linux background that can detect bad bot behavior and automatically ban them thus adding them to a bad list and hence increasing efficiency of the site.
If you guys have ideas or such scripts, please do help.
Thanks
17 Replies
It has a badbot part built in if you use debian
I am trying out fail2ban.
Hopefully it should reduce the bots/crawlers.
is empty.
Any ideas why it will be blank ?
Do I need to enable any config in apache2.conf for the log to start filling?
thanks in advance
If you have used any of the linode libary guides your logs will be located elsewere.
fail2ban is not working.
CPU loads are continuosly at elevated levels.
Ny more ideas or any powerful script to ban spiders.
Order Deny,Allow
Deny from xxx.xxx.xxx.xxx
And you can even block at the Class C network level:
Order Deny,Allow
Deny from xxx.xxx.xxx
If you see multiple attempts from different hosts at the same Class C network.
I do this to block IP's that are trying to break thru my CAPTCHA at one of my websites.
While installing AWSTATS for debian, the paths are all not updated in awstats configuration file. Particularly I need help in updating the following paths variable which is being provided for an earlier version of debian.
$AWSTATS_PATH='';
$AWSTATSICONPATH='/usr/share/awstats/icon';
$AWSTATSCSSPATH='/usr/share/doc/awstats/examples/css';
$AWSTATSCLASSESPATH='/usr/share/doc/awstats/examples/classes';
$AWSTATSCGIPATH='/usr/lib/cgi-bin';
$AWSTATSMODELCONFIG='/etc/awstats/awstats.model.conf'; # Used only when configure ran on linux
$AWSTATSDIRDATAPATH='/var/lib/awstats';
Please provide any updated documentation on awstats for debian or if anyone has it installed and has these parameters set, please help.
Am not able to see stats through the browser which I think is a problem with thevar not being correctly set
The awstats_configure.pl should more or less make sure that all paths correspond with what you have on your system - Did you run the perl script?
I have made all possibloe changes to my.cnf and apache2.conf to have a stable system but to no avail.
I checked awstats and found that yahoo slurp is creating the maximum trouble.
So blocked yahoo slurp through robots.txt and .htaccess.
SLURP somehow still manages to hit my site. I even blocked its IP 67.195.. but now I find it crawling through another address:
*.crawl.yahoo.net.
Any ways to block yahoo completely off my site? It is one @#$@ hole company.
output from top command
lollipop:~# htop
-bash: htop: command not found
lollipop:~# top
top - 15:43:38 up 1 day, 2:30, 1 user, load average: 24.34, 22.14, 18.43
Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.1%us, 81.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.1%st
Mem: 1417440k total, 839028k used, 578412k free, 9284k buffers
Swap: 524280k total, 3996k used, 520284k free, 161652k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7650 mysql 18 0 483m 44m 4576 S 399 3.2 58:13.00 mysqld
7872 root 15 0 2268 1140 880 R 0 0.1 0:00.84 top
1 root 15 0 1992 568 540 S 0 0.0 0:00.00 init
2 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1
5 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2
7 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2
8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3
9 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/0
11 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1
12 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/2
13 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/3
14 root 20 -5 0 0 0 S 0 0.0 0:00.00 khelper
15 root 11 -5 0 0 0 S 0 0.0 0:00.00 kthread
17 root 11 -5 0 0 0 S 0 0.0 0:00.00 xenwatch
@Guspaz:
Are you sure that it isn't your own site at fault? An unindexed table can cause huge load with a small amount of traffic.
This might be the issue.
I just can't see how web crawlers can slow down a website so much? They don't make more than a few requests a minute, so their visits should not be the issue.