DDoS from Anthropic AI
Going by Twitter, I'm not the only one who's seeing a huge amount of bot traffic coming from AI company Anthropic. Their bot ("ClaudeBot") doesn't respect robots.txt or ai.txt and they appear to be using hundreds of Amazon AWS servers to scape content, enough to overwhelm my servers last night.
Just in case anyone else is having issues and wants to block them at the firewall level, these are the ranges the company is using based on my server logs:
3.12.0.0/16
3.14.0.0/15
3.20.0.0/14
3.128.0.0/15
3.132.0.0/14
3.136.0.0/13
3.144.0.0/13
13.58.0.0/15
18.116.0.0/14
18.188.0.0/16
18.189.0.0/16
18.190.0.0/16
18.191.0.0/16
18.216.0.0/14
18.220.0.0/14
18.224.0.0/14
52.14.0.0/16
7 Replies
Although it can definitely be frustrating to deal with performance issues caused by bad actors, this seems like the perfect use case for our Cloud Firewall. Since it is external to your Linode, the firewall rules/processing are handled by our system instead of using an individual Linode's finite virtual resources.
Since you've already identified the source IP range, those subnets can easily be added as block
rules to prevent that traffic from reaching your Linode. For more information about configuration and applying Cloud Firewalls to existing Linodes, be sure to check out the following Docs guides:
Otherwise, although it doesn't apply in this instance, if you ever detect malicious activity originating from other Linode, please be sure to submit a report through our Abuse Portal. This helps streamline our Trust & Safety Team's ability to review activty and take action as-needed to help protect customers on our platform and users of non-Linode systems worldwide:
There's some more background on this here -- Amazon have poured money into the company and seemingly this is being used to spin up hundreds of AWS servers to start grabbing content from websites.
https://www.theguardian.com/technology/2024/mar/27/anthropic-amazon-ai-startup
In the end, I saw around 700 different IP addresses hammering my websites over a couple of hours, all with a user agent string containing "ClaudeBot/1.0" and within the ranges above. There are some suggestions online to use robots.txt file to block the bot, but due to the URLs requested I can confirm that it ignores anything in the file.
Assuming you don't want a multi-billion dollar company stealing your content and giving you nothing back in return, it's definitely worth looking at options to configure your web server software to block requests based on the user agent string. I've combined that with fail2ban to generate a firewall level block for the each IP address used by the bot.
I'm having the same issue. Because it's a DDoS it's not easy to block. Currently finding a way to block via user agent or some other string
I added this to .htaccess
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*ClaudeBot.*$
RewriteCond %{REQUEST_URI} !robots\.txt
RewriteRule .* http://www.anthropic.com [R,L]
</IfModule>
My setup is a little more complex and uses a combination of nginx and fail2ban.
For nginx, I've got a .conf file containing "bad bots" which is loaded as part of the overall web server config:
map $http_user_agent $blocked_bots {
default 0;
"~Claude-Web" 1;
"~ClaudeBot" 1;
}
…you can add extra lines to block other bots based on whatever is unique in their user agent string. Within individual server { ... }
sections, you can then choose to do something if the requester is a bad bot, e.g. to return no response at all to the request:
if ( $blocked_bots = 1 ) { return 444; }
As an alternative to returning a 444, you could return a 403 response to let the bot know that you're forbidding access to the requested URL.
You can then create a new fail2ban jail that monitors the nginx error log for requests that result in a 444 response and then block the IP at the server firewall level for a specific period of time. You could either create a regex that matches the response code (in this case 444 if you wanted to block all bad bots) or a regex that just matches specific user agent strings.
The benefit of the above setup is that the bot only gets the one chance to make a request before its IP address is blocked on the firewall.
The following single line command is a useful way of monitoring which IP addresses are currently blocked by fail2ban:
fail2ban-client status | grep "Jail list:" | sed "s/ //g" | awk '{split($2,a,",");for(i in a) system("fail2ban-client status " a[i])}' | grep "Status\|IP list"
Thanks for answering, you made my day. Do you also think that writing work is the most boring and time-consuming? if yes then you are not alone, I also think the same. that is why I have found this website called https://canadianwritings.com/ canadianwritings. they are writing services. they will write for you if you pay them. I like their work so much because they are affordable and do it very professionally. if you are also in need you can check their website.
I'm wondering if Linode/Akamai is going to start blocking this stuff as part of the built-in DDoS protection. At some point, it seems it's going to become absolutely necessary.
My public Gitea server was getting annihilated by Claudebot repeatedly scraping the same content without ceasing until I blocked it, and even then I was still getting repeatedly scraped by other, slightly more well-behaved, bots. That isn't what I wanted to do, but there's not really another way (that I can see) to put an end to the endless scraping.
This isn't legitimate traffic. If it's training AI or doing anything other than search engine indexing, it should be treated like the DDoS attack that it is. Regardless of the intent of the traffic, it's untenably abusive to both Linode's infrastructure and customers.