Re: About Training ( sa-learn )

Henrique Fernandes Thu, 04 Mar 2010 10:11:11 -0800

Thanks!

I will discuss here and find out with one is better.


What are the weight of the bayser score after they well trained ? Have any
ideas about it ?

[]'sf.rique


On Thu, Mar 4, 2010 at 2:41 PM, Bowie Bailey <bowie_bai...@buc.com> wrote:

> (Please send replies to the list)
>
> Henrique Fernandes wrote:
> >
> > On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey <bowie_bai...@buc.com
> > <mailto:bowie_bai...@buc.com>> wrote:
> >
> >     Henrique Fernandes wrote:
> >     > Nops, i wnat that after i trained, the same email, should get a
> >     higher
> >     > score cause the spamassassin was trained that is a spam, so when it
> >     > comes again , it should look in the database and add some extra
> >     point
> >     > on the score right ?
> >
> >     That is a fairly common misconception.  When you learn an email as
> >     spam,
> >     the Bayes system breaks it into tokens (words/character strings) and
> >     then makes a note that each of those tokens was seen in a spam.
> >      When an
> >     email comes in, it breaks up the new email into tokens and then
> checks
> >     to see how frequently each of those tokens was previously seen in
> spam
> >     or ham.  Based on what it finds, it ranks the email from BAYES_00
> >     (very
> >     unlikely to be spam) to BAYES_99 (almost certainly spam).
> >
> >     Since learning from a single email only adds one data point to each
> >     token, it is unlikely to make a major difference on its own.  The
> >     value
> >     comes in learning from lots of spam and ham.  This is why the Bayes
> >     rules will not run until you have learned from at least 200 ham
> >     and 200
> >     spam.
> >
> >
> > hmm
> >
> > Thanks, so ech individual user has to have learned lots of emails so
> > after that they will start to have an difference on score ?
>
> Yes. Each individual user will need to learn at least 200 ham and 200
> spam (manually or via auto-learn) before Bayes will start scoring.  The
> more they learn, the better the accuracy.
>
> > So is better to just traing one database to all user instead one base
> > for each user ?
> >
> > Making just one base i am afraid of getting to many false-positives.
> > Cause sometimes Viagra is not spam for some one that researhc it, but
> > if it is in the same base, it will be marked as spam...
>
> Depends on your users.  Unless they are wildly different, a single
> database should work fairly well.  Individual databases can be more
> accurate in some instances, but a single well-trained database will
> probably work better than a bunch of individual databases that are not
> trained consistently.
>
> --
> Bowie
>

Re: About Training ( sa-learn )

Reply via email to