> -----Original Message-----
> From: Mike Leone
> Sent: Sunday, June 01, 2003 8:25 AM
>
> So bayes wouldn't learn this was spam, unless the score was 19? I
> rarely get
> spam with scores higher than that. Am I misunderstanding?
>
In my recent spam mailbox, with about 2500 messages (over the past 10 days),
the scores are distributed as follows:
0% 5.00
10% 7.10
20% 8.50
30% 10.20
40% 12.00
50% 13.90
60% 15.80
70% 18.20
80% 20.90
90% 25.30
100% 62.50
I've found the "if score >= 10.0 then probably spam; if score >= 20.0 then
definitely
spam" rule of thumb to be a pretty good one. Mileage varies. BTW, the scores
above use some small
adjustments to the out-of-the-box SA 2.55 scores, which I tuned to eliminate
the
marginal spams without increasing false positives.
A similar sized sample (2200 messages over the past 1.5 months) in my
archived
incoming mail (ham) is:
0% -107.80
10% -1.30
20% -0.50
30% 0.00
40% 0.50
50% 0.80
60% 1.10
70% 1.90
80% 2.60
90% 3.40
100% 4.90
Here, I'd say that 2.60 and below is most likely ham, not spam.
In my "false negatives" folder (about 750 items collected over the course
of 9 months), the distribution is:
0% -6.20
10% 1.30
20% 2.20
30% 2.60
40% 3.40
50% 3.80
60% 4.10
70% 4.40
80% 4.60
90% 4.80
100% 4.90
If I set the threshold at 3.4, I'd eliminate 60% of the false negatives
(spam mis-classified as ham), but would throw out roughly 10% of the ham.
Probably a bad trade. Better, is to find filters/scores that differentiate
them.
-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk