RCVD_IN_DNSWL_HI false positives

ioplex 3 years, 7 months ago

I just found out something interesting. Apparently dnswl.org can return false positives to:

"use additional methods to make them stop (ab)using our nameservers (namely, returning a „_HI“ result in the hope that whoever is responsible will finally notice)"

So dnswl.org is deliberately returning false positives that ALLOWS spam to get through (SA will score as RCVD_IN_DNSWL_HI which for me is -5) so that people will stop abusing their free service.

Is this a known thing?

My DNS servers are linode.com of course. So I take it this is dnswl.org's way of encouraging linode.com to pay for this service?

Also, aren't the dnswl.org queries cached? Why would this be such a burden to them if it's all cached?

Strange.

Any insights would be appreciated. I'm still trying to figure out how to disable the dnswl.org queries (as opposed to just scoring them as 0 in local.cf).

Mike

4 Replies

_Brian 3 years, 7 months ago Linode Staff

This is the first I've seen about the DNSWL RBL returning false positives, however we are aware that our public resolvers are likely to become blocked by DNSWL. The DNSWL service puts a limit of 100k requests per day per resolver public IP address, and an aggregate of Linode customers using our public resolvers can easily pass this limit, even on accident.

I reached out to the DNSWL team last Summer to see what we could do to resolve this, and spoke to the same person that responded to this mailing list inquiry.

Their suggestion was to have our customers run their own resolvers:

In the email industry, as an inbound mail server, the best practices are to run and operate your own local resolver, if you are using third party RBL/RHSBL lists like ours classify as ( although ours is obviously going the opposite way :o) ).

Public resolvers, e.g. not only 100% public like Google, CloudFlare and OpenDNS, but also per ISP-locked resolvers like yours, that all of your customers can use, are likely to be blocked both from DNSWL and many other reputation related organisations (Spamhaus, URIBL, et cetera), due to too many queries.

From a technical perspective, people can literally hide behind such shared DNS servers, as there aren't really any ways for us to see through the queries, determining whether it is one or another organisation making them. All we can see is that the (shared) DNS server is making too many queries.

If the resolver(s) your customer changed to are still resolvers in the same way as the old, e.g. public for Linode customers, or 100% public, it is very likely for them to go the same way at some point, if they end up on showing a heavy usage as well, that becomes inconsistent with our terms.

For what it's worth, I agree with their logic. I also do not believe this to be a ploy to get Linode to pay for their service; I specifically asked if we should, and they said no. Their position is that a Linode license to use DNSWL would only apply to those within the Linode organization only, and it would not apply to our customers.

The solution if you want to continue using DNSWL with SpamAssassin, is likely going to be to set up your own local resolver to perform DNSWL lookups.

I'm spitballing here, but would stopping the lookups via /etc/hosts cause problems for your mail server? Also, though I've never configured SpamAssassin myself, in looking at DNSWL's instructions I'm inferring you can modify the score RCVD_IN_DNSWL_HI to 0 to stop these false positives from affecting the spam score.

stevewi 3 years, 7 months ago

@_Brian writes:

I'm spitballing here, but would stopping the lookups via /etc/hosts cause problems for your mail server? Also, though I've never configured SpamAssassin myself, in looking at DNSWL's instructions I'm inferring you can modify the score RCVD_IN_DNSWL_HI to 0 to stop these false positives from affecting the spam score.

Rather than a system based on heuristics, you could also use a different spam filtering mechanism that relies on statistical analysis and a local cache of known exemplars in each category (like crm114).

My primary filter is crm114… If crm114 can't come to a conclusion, the email is sent to spamassassin. If spamassassin comes to a conclusion, crm114 is trained with that conclusion. If spamassassin cannot come to a conclusion, the email is sent to the postmaster's Junk box for manual training.

crm114 is very accurate and it trains quickly so my spamd(8) doesn't do a lot.

All this is strung together with a simple Ruby script and it runs as part of the mail delivery pipeline using dovecot sieve.

I've been using this basic scheme since the 1990's so the crm114 sparse spectra files (spam.css & nonspam.css) have a lot of data in them. Depending on the recipient, the cache can be 7-10 days of the recipient's messages (the cache is cleaned every night).

-- sw

millisa 3 years, 7 months ago

Quick note about the setting a spamassassin test score to zero - this disables the rule from running. If you don't want to disable the rule and have it not impact the overall score, set it to a very tiny number (0.01 or -0.01 works).

Reference: https://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html#item_score_symbolic_test_name_n_2enn__5b_n_2enn_n_2enn_