We haven't had working statistics viewing for a few weeks, but now it is fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL.

http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_MSPIKE_BL/detail

RCVD_IN_MSPIKE_BL has nearly the highest spam detection ratio of all the DNSBL's, second only to RCVD_IN_XBL. But our measurements also indicate it is detecting this huge amount of spam with a very good ham safety rating.

* 84% overlap with RCVD_IN_XBL - redundancy isn't a huge problem here because XBL is a tiny score. But 84% is surprisingly low overlap ratio for such high spam detecting rule. This confirms that Mailspike is doing an excellent job of building their IP reputation database in a truly independent fashion. * 67% overlap with RCVD_IN_PBL - overlap with PBL is concerning because PBL is a high score. But 67% isn't too bad compared to other production DNSBL's.
* 58% overlap with RCVD_IN_PSBL - pretty good

Given Mailspike's sustained decent performance since late 2009, it seems clear that it is a great candidate for addition to spamassassin-3.4 by default. It would be very interesting to see what it does to the scores during an automatic rescoring of the network rules.

Thoughts about Future Rescoring
===============================
Before that rescoring, we may want to have a serious discussion about reducing score pile-up in the case where multiple production DNSBL's all hit at the same time. Adam Katz' approach is one possibility, albeit confusing to users because users see subtractions in the score reports. There may be other better approaches to this.


In related news...
==================
http://www.spamtips.org/2011/01/dnsbl-safety-report-1232011.html
The January DNSBL Safety report found RCVD_IN_SEMBLACK to be reasonably safe, but at the time it overlapped with RCVD_IN_PBL 91% of the time making it dangerously redundant due to PBL's high production score.

http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_SEMBLACK/detail
Our most recent measurements indicate that SEMBLACK is back to previous behavior of extremely poor safety rating, with false positives on ~7% of ham from recent weeks.

It was a bad idea to use SEMBLACK earlier this year due to the high overlap with RCVD_IN_PBL, but this significant decline in safety rating is a clear indication that you should not be using RCVD_IN_SEMBLACK.

http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_HOSTKARMA_BL/detail
HOSTKARMA_BL overlaps with MSPIKE_BL 88% of the time, but detects far fewer spam and and with slightly more FP's. Compared to last year, HOSTKARMA_BL's safety rating has improved considerably on a sustained basis, and if we were evaluating it alone it wouldn't be too bad. But now that we see the overlaps, HOSTKARMA_BL at this very moment is pretty close to a redundant and slightly less safe subset of RCVD_IN_MSPIKE_BL. Given these measurements, it probably isn't helpful to use HOSTKARMA_BL.

Warren Togami
war...@togami.com

Reply via email to