On Sun, July 29, 2007 20:25, David Abrahams wrote:
>
> on Fri Jul 27 2007, skip-AT-pobox.com wrote:
>
>> Brendon> I've just started using spambayes again after a while away
>> from
>> Brendon> it. Now, 3 days in, I notice that I've trained on far more
>> Brendon> spam than ham. (Total emails trained: Spam: *432* Ham:
>> *64) I
>> Brendon> seem to remember that this was previously my experience in
>> the
>> Brendon> past.
>>
>> Are you training on every message you receive or just the mistakes?
>> Most
>> people generally only train on the mistakes and unsures. Your ratio is
>> about 7:1. That's a bit high.
>
> Even training only on mistakes and unsures, I have had a steadily
> increasing ratio for months. I almost never see a misclassified ham
> and only very rarely a ham about which the system is unsure. It's
> unsure about spam every day.
I have the same experience:
[EMAIL PROTECTED] { ~ }$ ./spamstats
Spam: 2415 Ham: 651
That's 3.7:1, and it's increasing. Nonetheless I have never seen a false
positive. I only train on mistakes and unsures.
Most of my email is to/from the same 50 people or so, and most of the time
they write messages longer than 50 words, and almost all of them in Dutch.
The very few times I saw a spam classified as ham, it had Dutch nonsense
words in it.
I would agree that in theory having equal amounts of ham and spam would be
better, however in my particular case there are significant factors that
mitigate the need of a 1:1 ratio. I'm also claiming that my particular
situation cannot be used to draw general conclusions, and that Your
Mileage May Vary(tm).
--
Amedee
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html