Warren Togami wrote: > Overlap analysis shows the majority of XBL and PBL are also listed by > Barracuda. Furthermore Barracuda's list seems to have a similar hit > % as XBL + PBL combined. Is Barracuda known to aggregate Spamhaus > data with their own? If so we might be adding redundant scores in a > dangerous and undesirable manner. > > Adam Katz sa-update channels contains DNSBL rule overlap adjustments > in an attempt to compensate for what he calls "incestuous" > blacklists. I am beginning to think this is a good idea to explore > for spamassassin upstream if in fact one blacklist is aggregating > data from another blacklist.
I should say more about my overlap rules (which is the PC version of what I called in earlier versions and in comments as "incestuous"). I've noticed that a lot of these blocklists have a lot of overlap on the same ham. Some of them syndicate common upstream sources, but more importantly, they share the same propagation methods. Spam traps are limited in what they can pick up while still staying pure; using list subscription + unsubscription, catch-all accounts on guessable or subtly "advertised" domains, cleaned-up stale email accounts, feeding addresses to spam bots, and perhaps a few other bags of tricks. This fishing for spam will lure the same spammers across the board, thus the overlap. This overlap is a problem because some spammers are smart enough to cycle through relays and hope for one known (rightly or not) for sending ham, or at least *not* known for sending spam. Overlap from DNSBLs can completely kill ham, and I think a multifaceted system like SpamAssassin should not apply 5+ points (out of 5) to a message solely from DNSBLs** when there are so many other tools available. Real spam will bump into something else. That brings me to a big pet peeve of mine on DNSBLs: they 'clean' themselves of this problem by using DNSWLs ... and spammers know this. The 'whitelisting' supplied by a DNSWL is in my opinion not appropriate for a DNSBL to use. Instead, a DNSBL-dedicated reference is needed, perhaps even one that is not publicly available. As to how such a thing would be populated ... that's a great question. If it's anything that could be publicly accessible, I'd prefer DNSBLs to either use NOTHING and let their users cross-check or else use a different return code to indicate the hit anyway so that I can act on it anyway. *Especially* while DNSWLs lack an abuse-reporting mechanism. I have seen SO much DNSWL'd spam that I've had to migrate to using confirmation; like whitelist_from vs whitelist_auth on a DNSWL level. In my khop-bl sa-update channel, any DNSWL'd message that doesn't pass DKIM or SPF gains a point while any that does loses 2.25 (unless it's already been lowered by overlapping DNSWL scores). ... actually, I'm surprised I gave it such a swing given spammers' increasing use of SPF and DKIM. ** Another pet peeve: Mail should not be able to be marked as spam from a single category of detection mechanisms, aside from blacklists and perhaps a fully trained and moderated learning algorithm. I'd like to set a hard cap of mechanism categories to something like 3.5, perhaps 4.0 for something dynamically generated by incoming data (e.g. Bayes, AWL), but SA makes facilitating this kind of capping *really* hard. DNSBL/URIBL/DNSWLs are the only place that this sticks out enough for me to have remedied. My IXHASH rule is specifically designed to avoid this exact problem. It uses the plugin's defaults of 0.1 per server hit and make their union the rule that gets the larger amount of points. If I had masscheck results, some servers scores might go up, but the bulk would still be applied by the meta rule. SA 3.4 (or 3.3 if it's not too late...) should (IMHO) include that sort of mechanism for DNSBLs. Not quite a cap, but close enough. The overlap rules in question are a part of my khop-bl channel, which is published at http://khopesh.com/Anti-spam#sa-update_channels not too far above my iXhash meta rule, which now includes the workaround update discussed here not too long ago.