Rampant Web Spider
They're not actually loading pages though; the bot, or at least certain ones, are getting hung at a 302 when my page redirects to have a www. in the url. Thank goodness they're not actually loading the root of my website that many times, or I'm pretty sure it'd be unusable. And I just came to the conclusion today that that probably means that it can't even get robots.txt (since it can't get past the redirection), because it's been blocked from there for a while now. But what's odd is that I've seen certain Twiceler bots actually crawling my site properly.
I wrote to the company before, but nothing's been done. They asked for my log files despite me telling them exactly what the problem was with lots of info on attempts and IPs. I mean, the log file was full of nearly identical lines of identical attempts, sometimes with the same timestamp even since it happens so quickly (as I explained to and showed them), but I sent them the huge log of thousands of nearly identical lines anyway.
I even checked Google a moment ago, and it appears I'm not the only one getting hit by this thing.
How annoying is it that people can't control their bots, and don't even pull them down despite knowing they have a problem? I'd bet this happens to a lot of folks, many without even realizing it, since a lot of websites have redirections in place. I really don't know what to do about it short of banning the dozens of IPs I've seen so far, or telling them to add me to a block list the guy mentioned. But that's not a solution, especially if it's still affecting others.
I wrote them again today because I'm pretty much growing tired of it (over 4000 attempts to crawl my site cluttering up my logs when I got up this morning), but I won't hold my breath after all the months that've gone by seeing the problem.
3 Replies
@harmone:
Can you make everything redirect but the robots.txt file? Maybe that would solve your problem for bots that cannot understand redirects.
That's a good idea!
Though I just got an email back from them, and they said that the IP from this morning, and many of the others in the log file I sent before, weren't theres. They also put a list of IPs on their site, and they all resolve to *.cuill.com. So that's odd, it seems many bots are masquerading as one of theirs, according to them.
If that's the case, I almost feel bad for thinking poorly of them! But now I'm left to try and just ban all those other IPs I guess, since they're fakes, and probably the same fakes causing others problems too.
EDIT: Hmm, checking Google, it seems other people having problems are getting them from the real bots. So I really dunno what to think anymore.
SetEnvIfNoCase User-Agent ".noxtrumbot." spambot=1
SetEnvIfNoCase User-Agent ".Indy Library." spambot=1
SetEnvIfNoCase User-Agent ".Zeus." spambot=1
SetEnvIfNoCase User-Agent ".linko." spambot=1
SetEnvIfNoCase User-Agent ".imagefetch." spambot=1
SetEnvIfNoCase User-Agent ".urniti." spambot=1
SetEnvIfNoCase User-Agent ".kuloko-bot." spambot=1
SetEnvIfNoCase User-Agent ".nameprotect." spambot=1
SetEnvIfNoCase User-Agent ".grub-client." spambot=1
SetEnvIfNoCase User-Agent ".WebCopier." spambot=1
SetEnvIfNoCase User-Agent ".Zyborg." spambot=1
SetEnvIfNoCase User-Agent ".WebZIP." spambot=1
SetEnvIfNoCase User-Agent ".Downloader." spambot=1
SetEnvIfNoCase User-Agent ".Ninja." spambot=1
SetEnvIfNoCase User-Agent ".OmniExplorer_Bot." spambot=1
SetEnvIfNoCase User-Agent ".omni-explorer." spambot=1
SetEnvIfNoCase User-Agent ".NG/2.0." spambot=1
SetEnvIfNoCase User-Agent ".WebStripper." spambot=1
SetEnvIfNoCase User-Agent ".mafin." spambot=1
SetEnvIfNoCase User-Agent ".MAFin." spambot=1
SetEnvIfNoCase User-Agent ".Snapbot." spambot=1
SetEnvIfNoCase User-Agent ".QihooBot." spambot=1
SetEnvIfNoCase User-Agent ".Baiduspider." spambot=1
SetEnvIfNoCase User-Agent ".baiduspider." spambot=1
SetEnvIfNoCase User-Agent ".iaskspider." spambot=1
SetEnvIfNoCase User-Agent ".Scanner." spambot=1
SetEnvIfNoCase User-Agent ".IRLbot." spambot=1
SetEnvIfNoCase User-Agent ".HTTrack." spambot=1
SetEnvIfNoCase User-Agent ".MSNPTC." spambot=1
SetEnvIfNoCase Referer
SetEnvIfNoCase Referer
Deny from 207.210.101.49
Deny from 210.82.118.14
Deny from 208.77.96.98
Deny from 82.99.30
Deny from 207.210.101.4
Deny from 64.79.219.5
Deny from 193.1.100.110
Deny from 86.95.251.198
Deny from 87.210.41.139
Deny from 164.100.111.69
Deny from 81.168.228.218
Deny from 200.73.70.195
Deny from 66.16.63.44
Deny from 88.45.219.250
Deny from 194.7.161.130
Deny from 65.112.42.83
Deny from 209.191.123.34
Deny from 212.241.248.10
Deny from 88.151.114.33
Deny from 65.88.178.10
Deny from 66.195.77.130
Deny from 194.27.13.195
Deny from 141.11.234.60
Deny from 65.19.150
Deny from 6.234.139
Deny from 38.99.203.110
Deny from 204.9.204.202
Deny from 60.28.17.43
Deny from 210.173.180.145
Deny from 64.27.31.205
Deny from 137.82.84.97
Deny from 88.151.114.37
Deny from env=spambot
Allow from all