(Please send replies to the list) Henrique Fernandes wrote: > > On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey <bowie_bai...@buc.com > <mailto:bowie_bai...@buc.com>> wrote: > > Henrique Fernandes wrote: > > Nops, i wnat that after i trained, the same email, should get a > higher > > score cause the spamassassin was trained that is a spam, so when it > > comes again , it should look in the database and add some extra > point > > on the score right ? > > That is a fairly common misconception. When you learn an email as > spam, > the Bayes system breaks it into tokens (words/character strings) and > then makes a note that each of those tokens was seen in a spam. > When an > email comes in, it breaks up the new email into tokens and then checks > to see how frequently each of those tokens was previously seen in spam > or ham. Based on what it finds, it ranks the email from BAYES_00 > (very > unlikely to be spam) to BAYES_99 (almost certainly spam). > > Since learning from a single email only adds one data point to each > token, it is unlikely to make a major difference on its own. The > value > comes in learning from lots of spam and ham. This is why the Bayes > rules will not run until you have learned from at least 200 ham > and 200 > spam. > > > hmm > > Thanks, so ech individual user has to have learned lots of emails so > after that they will start to have an difference on score ?
Yes. Each individual user will need to learn at least 200 ham and 200 spam (manually or via auto-learn) before Bayes will start scoring. The more they learn, the better the accuracy. > So is better to just traing one database to all user instead one base > for each user ? > > Making just one base i am afraid of getting to many false-positives. > Cause sometimes Viagra is not spam for some one that researhc it, but > if it is in the same base, it will be marked as spam... Depends on your users. Unless they are wildly different, a single database should work fairly well. Individual databases can be more accurate in some instances, but a single well-trained database will probably work better than a bunch of individual databases that are not trained consistently. -- Bowie