Re: About Training ( sa-learn )

Bowie Bailey Thu, 04 Mar 2010 09:22:38 -0800

Henrique Fernandes wrote:
> Nops, i wnat that after i trained, the same email, should get a higher
> score cause the spamassassin was trained that is a spam, so when it
> comes again , it should look in the database and add some extra point
> on the score right ?


That is a fairly common misconception.  When you learn an email as spam,
the Bayes system breaks it into tokens (words/character strings) and
then makes a note that each of those tokens was seen in a spam.  When an
email comes in, it breaks up the new email into tokens and then checks
to see how frequently each of those tokens was previously seen in spam
or ham.  Based on what it finds, it ranks the email from BAYES_00 (very
unlikely to be spam) to BAYES_99 (almost certainly spam).

Since learning from a single email only adds one data point to each
token, it is unlikely to make a major difference on its own.  The value
comes in learning from lots of spam and ham.  This is why the Bayes
rules will not run until you have learned from at least 200 ham and 200
spam.

-- 
Bowie

Re: About Training ( sa-learn )

Reply via email to