Rampant Web Spider

Has anyone else been having trouble getting massive amounts of connections from a web spider called Twiceler? I'm getting hit literally thousands of times per "session", and it's been happening for months it would seem, after having went and checked my logs. Each time it goes into one of its fits, it comes from the same IP. But the IPs are hardly ever the same between each rampant attempt.

They're not actually loading pages though; the bot, or at least certain ones, are getting hung at a 302 when my page redirects to have a www. in the url. Thank goodness they're not actually loading the root of my website that many times, or I'm pretty sure it'd be unusable. And I just came to the conclusion today that that probably means that it can't even get robots.txt (since it can't get past the redirection), because it's been blocked from there for a while now. But what's odd is that I've seen certain Twiceler bots actually crawling my site properly.

I wrote to the company before, but nothing's been done. They asked for my log files despite me telling them exactly what the problem was with lots of info on attempts and IPs. I mean, the log file was full of nearly identical lines of identical attempts, sometimes with the same timestamp even since it happens so quickly (as I explained to and showed them), but I sent them the huge log of thousands of nearly identical lines anyway.

I even checked Google a moment ago, and it appears I'm not the only one getting hit by this thing.

How annoying is it that people can't control their bots, and don't even pull them down despite knowing they have a problem? I'd bet this happens to a lot of folks, many without even realizing it, since a lot of websites have redirections in place. I really don't know what to do about it short of banning the dozens of IPs I've seen so far, or telling them to add me to a block list the guy mentioned. But that's not a solution, especially if it's still affecting others.

I wrote them again today because I'm pretty much growing tired of it (over 4000 attempts to crawl my site cluttering up my logs when I got up this morning), but I won't hold my breath after all the months that've gone by seeing the problem.

3 Replies

Can you make everything redirect but the robots.txt file? Maybe that would solve your problem for bots that cannot understand redirects.

@harmone:

Can you make everything redirect but the robots.txt file? Maybe that would solve your problem for bots that cannot understand redirects.

That's a good idea!

Though I just got an email back from them, and they said that the IP from this morning, and many of the others in the log file I sent before, weren't theres. They also put a list of IPs on their site, and they all resolve to *.cuill.com. So that's odd, it seems many bots are masquerading as one of theirs, according to them.

If that's the case, I almost feel bad for thinking poorly of them! But now I'm left to try and just ban all those other IPs I guess, since they're fakes, and probably the same fakes causing others problems too.

EDIT: Hmm, checking Google, it seems other people having problems are getting them from the real bots. So I really dunno what to think anymore.

My current top-level .htaccess:

SetEnvIfNoCase User-Agent ".noxtrumbot." spambot=1

SetEnvIfNoCase User-Agent ".Indy Library." spambot=1

SetEnvIfNoCase User-Agent ".Zeus." spambot=1

SetEnvIfNoCase User-Agent ".linko." spambot=1

SetEnvIfNoCase User-Agent ".imagefetch." spambot=1

SetEnvIfNoCase User-Agent ".urniti." spambot=1

SetEnvIfNoCase User-Agent ".kuloko-bot." spambot=1

SetEnvIfNoCase User-Agent ".nameprotect." spambot=1

SetEnvIfNoCase User-Agent ".grub-client." spambot=1

SetEnvIfNoCase User-Agent ".WebCopier." spambot=1

SetEnvIfNoCase User-Agent ".Zyborg." spambot=1

SetEnvIfNoCase User-Agent ".WebZIP." spambot=1

SetEnvIfNoCase User-Agent ".Downloader." spambot=1

SetEnvIfNoCase User-Agent ".Ninja." spambot=1

SetEnvIfNoCase User-Agent ".OmniExplorer_Bot." spambot=1

SetEnvIfNoCase User-Agent ".omni-explorer." spambot=1

SetEnvIfNoCase User-Agent ".NG/2.0." spambot=1

SetEnvIfNoCase User-Agent ".WebStripper." spambot=1

SetEnvIfNoCase User-Agent ".mafin." spambot=1

SetEnvIfNoCase User-Agent ".MAFin." spambot=1

SetEnvIfNoCase User-Agent ".Snapbot." spambot=1

SetEnvIfNoCase User-Agent ".QihooBot." spambot=1

SetEnvIfNoCase User-Agent ".Baiduspider." spambot=1

SetEnvIfNoCase User-Agent ".baiduspider." spambot=1

SetEnvIfNoCase User-Agent ".iaskspider." spambot=1

SetEnvIfNoCase User-Agent ".Scanner." spambot=1

SetEnvIfNoCase User-Agent ".IRLbot." spambot=1

SetEnvIfNoCase User-Agent ".HTTrack." spambot=1

SetEnvIfNoCase User-Agent ".MSNPTC." spambot=1

SetEnvIfNoCase Referer www.addresses.com spambot=1

SetEnvIfNoCase Referer www.bwdow.com spambot=1

Order allow,deny

Deny from 207.210.101.49

Deny from 210.82.118.14

Deny from 208.77.96.98

Deny from 82.99.30

Deny from 207.210.101.4

Deny from 64.79.219.5

Deny from 193.1.100.110

Deny from 86.95.251.198

Deny from 87.210.41.139

Deny from 164.100.111.69

Deny from 81.168.228.218

Deny from 200.73.70.195

Deny from 66.16.63.44

Deny from 88.45.219.250

Deny from 194.7.161.130

Deny from 65.112.42.83

Deny from 209.191.123.34

Deny from 212.241.248.10

Deny from 88.151.114.33

Deny from 65.88.178.10

Deny from 66.195.77.130

Deny from 194.27.13.195

Deny from 141.11.234.60

Deny from 65.19.150

Deny from 6.234.139

Deny from 38.99.203.110

Deny from 204.9.204.202

Deny from 60.28.17.43

Deny from 210.173.180.145

Deny from 64.27.31.205

Deny from 137.82.84.97

Deny from 88.151.114.37

Deny from env=spambot

Allow from all

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct