Re: BAYES question

Axb Sat, 27 Apr 2013 02:03:56 -0700

On 04/27/2013 10:59 AM, Jari Fredriksson wrote:

27.04.2013 04:54, Karsten Bräckelmann kirjoitti:

And it is good advice to keep the initial training corpora to a
ratio roughly assembling your ham/spam ratio, or maybe 1/1. (At this
point, we're approaching woodoo. Learning 10 times more ham than spam is
most likely to be a bad choice, though.)

I don't see any problem with having a corpus like this:


0.000          0      28252          0  non-token data: nspam
0.000          0     187579          0  non-token data: nham

I have no problems with Bayes whatsoever.


how many users? domains?
Can hardly be a heavily spammed setup or it would look more like:

0.000          0    7762525          0  non-token data: nspam
0.000          0    4171794          0  non-token data: nham
(a week's worth of tokens)

Re: BAYES question

Reply via email to