Jeff Chan <[EMAIL PROTECTED]> writes: > I agree with the content check, but will step on many toes here > by proclaiming that other blacklists (other than SBL), name > servers, registrars, ISP address blocks, and similar approaches > are overly broad and have too much potential for collateral > damage *for my sensibilities*.
There are other blacklists just as accurate as SBL (and some more accurate). And bear in mind these are secondary checks to lower the threshold for a URI already reported to SpamCop so the accuracy should be really good (two 99% accurate features => more than 99% accurate together). OVERALL% SPAM% HAM% S/O RANK SCORE NAME 69948 37790 32158 0.540 0.00 0.00 (all messages) 100.000 54.0258 45.9742 0.540 0.00 0.00 (all messages as %) 1.016 1.8815 0.0000 1.000 0.93 8.60 RCVD_IN_OPM_SOCKS 2.918 5.3956 0.0062 0.999 0.94 0.62 RCVD_IN_NJABL_DIALUP 1.138 2.1037 0.0031 0.999 0.93 8.60 RCVD_IN_OPM_HTTP 1.107 2.0455 0.0031 0.998 0.93 8.60 RCVD_IN_OPM_HTTP_POST 7.769 14.3292 0.0591 0.996 0.94 1.27 RCVD_IN_SBL 2.698 4.9749 0.0218 0.996 0.93 0.53 RCVD_IN_RSL 19.630 36.1842 0.1772 0.995 0.97 2.55 RCVD_IN_SORBS_DUL 3.127 5.7581 0.0342 0.994 0.92 0.74 RCVD_IN_NJABL_SPAM 9.759 17.9360 0.1493 0.992 0.93 1.20 RCVD_IN_SORBS_MISC 5.067 9.3146 0.0746 0.992 0.92 0.01 T_RCVD_IN_AHBL_SPAM 0.815 1.4978 0.0124 0.992 0.91 1.20 RCVD_IN_SORBS_SMTP 32.202 59.1532 0.5317 0.991 0.99 1.10 RCVD_IN_DSBL 17.386 31.8735 0.3607 0.989 0.95 1.00 RCVD_IN_XBL 13.524 24.8002 0.2736 0.989 0.94 1.20 RCVD_IN_NJABL_PROXY 9.088 16.6711 0.1772 0.989 0.93 1.20 RCVD_IN_SORBS_HTTP (some older mail being tested, so these numbers are going to be somewhat off) > I really, really hate blacklisting innocent victims. I consider that > a false accusation or even false punishment. Having policies which > allow blacklisting an entire ISP or even an entire web server IP > address have the potential to harm too many innocent bystanders, IMO. > Your mileage may and probably does vary. ;) You already have a repeated URL. Are you just railing about other blacklists or did you really consider my suggestion? SpamCop is no more accurate than the above blacklists. People report ham all the time, sometimes repeatedly. > Our approach is to start with some likely good data in the > SpamCop URIs. See comments below. And these are ways to make the data more accurate. > I agree in principle, however I feel that the SpamCop reported > URIs tend to have relatively few FPs. They are domains that > people took the time to report; in essence they are *voting with > their time that these are spam domains*. Again, SpamCop has false positives. It is no magic bullet. Some mailing lists are very low volume so when an announcement or conference notice goes out, people report it as spam even though they actually subscribed. It happens all the time. I think pre-seeding a whitelist would be a sensible precaution against joe jobs and the more sporadic (for any one domain, SpamCop has false positives probably every day) type of false positive. > I hope I'm not taking too confrontational a tone here. I'm just > trying to defend our approach, which I think can be valid. Nobody is attacking your approach. I only made these suggestions to potentially allow you to selectively lower or raise your threshold for specific URLs based on other data and therefore increase your accuracy and spam hit rate. I suspect your blacklist will work well once a plug-in supports it, but until then it seems like further discussion is a waste of my time. Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
