John Hardin wrote:
On Tue, 23 Sep 2008, Rob McEwen wrote:
Or, these could be "False-False Positives"... which is a very good thing because that would mean that those were really spams that would have scored "below threshold" without use of the new list. (or, some mix of these two)
So, for the purposes of an analysis like this, perhaps the results should be broken into *three* categories: obviously spam, obviously ham, and borderline.

Those initial stats are computer generated. Any follow-up analysis should be more human-generated. There is definitely a "borderline" category but I'd suggest that computer generated stats be left alone. Trying to get a "borderline" by the spam filter's scoring alone is a bad idea. Why? Because, simply put, some DNSBLs are able to catch spam that, quite frankly, scores very low in many systems when that DNSBL is absent (think of "first responder" dnsbls!). So splitting out into subcategories based on computer-generated-scoring only muddies the waters further.

Instead, the person running the stats could examine the actual messages (that is, those classified by the spam filter as "ham") more closely and then follow up the computer generated stats with their own personal opinion about what was seen in those messages. Even a cursory analysis would be far better than nothing. Few are going to have the time or inclination to get get extremely detailed in such analysis. But hey, that would be great too. But just a little analysis of that "ham" pile is far better than nothing. (NOT complaining about Alex's post, btw... again, that is why he said "fwiw"... this is more of a general suggestion for everyone about such stats.)

--
Rob McEwen
http://dnsbl.invaluement.com/
[EMAIL PROTECTED]
+1 (478) 475-9032



Reply via email to