How Do I Eliminate Bad IPs Without Affecting True Visitors' IPs On My Website

I was wondering if there would be a way in which I can get the best of both worlds: eliminating bot and spammers IPs while at the same time allowing true visitors' IPs on my website.

I have a Wordpress site with a security plugin installed. I can see the high volume of spammers and bots hitting my website. As much as I want to whitelist my IP using IP tables to eliminate this unnecessary noise, it wouldn't be practical. Doing so will cause all traffic to be blocked and I wouldn't want that obviously for my business. I am trying to make sales on my website, so I can't block my customers IP address.

Is there an app or tool or function that I can set up to automatically filter out and block bad IPs while preserving the good ones?

I know about fail2ban but that's more useful for brute force attacks at a login page. I don't think it would be relevant if someone is just visiting my website without trying to get access to the sign in page.

As this is a very pervasive and nuisance problem in the Internet world, I would appreciate any help immensely!

8 Replies

How are these bots and spammers impacting your business? What is the actual problem you're trying to prevent?

As far as web servers go, I use a list of bad bots. See: https://github.com/mitchellkrogza . It's enforced by the web server (the traffic would be blocked before Wordpress even knew it was there) and updated every night. It's easy to set up and pretty effective. It may give you some relief.

I also use blacklists…a far more draconian measure. It's probably not the most elegant solution (and the blacklists themselves consume kernel memory) but it works for me.

I have a cron job that collects blacklisted data from a number of sources -- geographical data, bogons, ASNs (autonomous system numbers; see https://www.whatismyip.com/asn ) and sets of individual IP addresses (mostly from iplists.firehol.org…but there are a few other sources). All these lists come in IPv4 and IPv6 flavors.

Once collected the data is sorted into IPv4 and IPv6 lists. For each list, the following actions are performed to condense each list:

  • networks (represented by CIDR ranges) are formed from individual IP addresses if possible, then the individual IP addresses are tossed out of the list;

  • remaining individual IP addresses are scanned to see if they belong to a network already in the list -- if so, the IP addresses are tossed out; and

  • lastly, networks can overlap so they are scanned to see if they can be condensed into larger networks -- if so, the larger network is retained and the smaller ones are tossed out.

The goal of all the steps above is to create the smallest possible lists that still blocks all the entries. There are four lists (organized this way to because ipset(1) hashes networks and individual IP addresses differently):

  • bl-nodes4 (a blacklist of individual IPv4 addresses);
  • bl-nodes6 (a blacklist of individual IPv6 addresses);
  • bl-nets4 (a blacklist of IPv4 networks); and
  • bl-nets6 (a blacklist of IPv6 networks).

Each of these lists is set into a table managed by ipset(1). The firewall configuration has rules to block inbound traffic from all entries in the tables. Here are the list sizes at the moment:

Name: bl-nodes4
Type: hash:ip
Revision: 4
Header: family inet hashsize 65536 maxelem 166992
Size in memory: 3265048
References: 4
Number of entries: 166928

Name: bl-nodes6
Type: hash:ip
Revision: 4
Header: family inet6 hashsize 1024 maxelem 70
Size in memory: 680
References: 4
Number of entries: 6

Name: bl-nets4
Type: hash:net
Revision: 6
Header: family inet hashsize 16384 maxelem 45819
Size in memory: 1143448
References: 4
Number of entries: 45755

Name: bl-nets6
Type: hash:net
Revision: 6
Header: family inet6 hashsize 8192 maxelem 20301
Size in memory: 936888
References: 4
Number of entries: 20237

Here are the stats at the moment for the amount of traffic blocked (from iptables(8)…the first number is the number of packets & the second is the number of bytes):

  • IPv4
 250K   12M BLACKLIST.bi.1.in match-set bl-nets4 src 
 114K 5500K BLACKLIST.bi.3.in match-set bl-nodes4 src
  • IPv6
  254 16956 BLACKLIST.bi.2.in match-set bl-nets6 src
    0     0 BLACKLIST.bi.4.in match-set bl-nodes6 src

The ipset(1) tables are updated every four hours. Surprisingly, the collection and condensing steps only take a few seconds. Most of the data comes from files by pulling from the git(1) repository at https://github.com/firehol/blocklist-ipsets.git .

Once the lists are cooked, the process of installing them takes less than a second. I've used some variation of this system for several years (the collection and condensing steps have undergone quite a few changes to improve performance and to reduce the size of the resulting lists).

I can block traffic whole countries using this scheme…at present, I block traffic from the following countries (see https://www.worldatlas.com/aatlas/ctycodes.htm ):

CN (plus proxies & relays that operate in other places…blocked by ASN), HK (plus proxies & relays that operate in other places…blocked by ASN), RU (plus proxies & relays that operate in other places…blocked by ASN), MO, VN, NP, KR, UA, BY, TW, IN (plus proxies & relays that operate in other places…blocked by ASN), PK, BR, NP, SC, ID, IR, IQ, AR.

I have different reasons for including a country on this list…and the list is dynamic based on conditions I monitor. I have a command-line tool to manage the configuration file that manages the blacklist data sources.

Like I said, it may not be the most elegant solution but it works well for me. That being said, it may not work for you. I'm always open to suggested improvements.

My Linode is primarily a small, limited-account mail server. Most of the blocked countries generate prodigious amounts of spam.

-- sw

@Woet. The benefit would be less noise and consumption of resources on apache. Every time someone visits my web page, apache gets utilized. If I am getting thousands of bots hitting my web server that can mean downtime.

Plus, I just don't like the feeling of having intruders knocking on my "door" even though I have "locks' in place.

Would you like if a random person knocked on your door every day?

@stevewi I am blown away from your in-depth response once again!

Thank you for the bottom of my heart for taking the time out of your busy schedule to systematically walk me through the process of setting this up.

To be frank, your blacklist strategy is wayyyyy above my knowledge span. It's a very interesting strategy, though, worth pursuing in the near future. For now, this will be cumbersome as I don't even know where to start.

I am also ecstatic of the idea of blocking per country not ASN. I thought ASN would be the highest level or hierarchy to block a bucket of IPs rather than individually. This is purely a work of art and I am going to definitely look into it.

I know right off the bat, I don't want certain countries to access my site like China, Russia, Middle East, etc. I understand you have different reasons for including a country on your list and the list is dynamic based on conditions you monitor.

But since this is too advanced for me and given my rudimentary knowledge of Linux, I wanted to get your feedback. Would you say for NOW, blocking bad bots using https://github.com/mitchellkrogza and blacklisting by country code would be the way to go?

Thanks again man for sharing your extensive knowledge in this topic.

So, bad 'bot blocking is not per-country. It's a different scheme entirely -- based on the a 'bot signature…not IP address.

I wrote all the code for all the blacklist stuff myself…and am willing to give it away. However, I'm not willing to support it beyond my own needs. It's a combination of Ruby code, shell scripts and only has one package dependency -- grepcidr(1) -- which, most likely, you can install as a package. It also depends on some code I wrote to build/access free GeoIP data from maxmind.com. I can give you that too. The GeoIP code depends on sqlite3. There's a setup script for each project that's pretty easy to follow (the blacklist one is more complicated). Each setup script has a --dry-run option so you can analyze what's going on and where things are going without actually doing anything.

There are some custom mods to fail2ban(1) rules as well… You can easily leave these out.

You'll have to figure out how to hook the ipsets into your firewall by yourself as well. I can help with that but this process is largely going to be dependent on how your firewall is set up. I use firehol(1) (http://firehol.org) as a high-level firewall configuration tool. You may use ufw…I don't know anything about that.

firehol(1) has a systemd(1) service. I wrote a systemd service that installs the blacklists after reboots (specifically, after the firehol(1) service starts). You'll have to figure out how to install systemd(1) services as well.

If we can figure out a way to exchange email addresses through a trusted third party, I can provide the source code and answer a limited number of questions about how it works. I really don't want to put it on github.com because I don't want to encourage support requests from random members of the public.

Since Linode knows how to contact both of us, maybe we can convince them to be the secure email address exchange mechanism. What say you, Linode?

-- sw

P.S. You still need to block by ASN. The Chinese and Russians have "subsidiaries" that operate in other geographic locations that are full of 'bots and spammers as well (for examples, the Russians in Gibraltar and the Chinese in Macau and the US…every spammer in the world seems to use the Seychelles and the Cayman Islands). The only way to block these organizations…especially in the US…is by ASN or individual network/node).

stevewi: I talked about this with the team since your request is the first I've come across of that nature. We think its probably best if we don't set that sort of precedent, although it's clear your intent is noble.

May I suggest a private Github repo? Or perhaps a simple pastebin with the source may work.

Also it's worth noting that Googling your handle provides your email address in the top result ;)

Thanks @_Brian. Since my gmail address is well-known, @digitaljoegeorge, send me an email with your email address to caponecicero@gmail.com and I'll send you compressed tars of the source code for all of the above.

We can use those email addresses to exchange questions/answers, etc.

FWIW, this is my "burner" address…

-- sw

Thank you so much Steve. You are the best. Thank you Linode once again for your help and contribution.

You guys rock!

Sending email now Steve…

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct