Re: bayes training question

mizzio Mon, 23 May 2005 05:46:39 -0700

Thank very much Loren.

regards,
mizzio


Il giorno lun, 23-05-2005 alle 04:51 -0700, Loren Wilton ha scritto:
> > - I get some messages marked as SPAM coming form this mailing list,
> > since the body contains URLs and text from real spam messages: do I have
> > to feed them in my DB as ham or this can cause some kind of bayes
> > poisoning ?
> 
> The best thing is to avoid having the mail from this list go through SA.
> There are various ways to do this, depending on your mail setup.
> 
> 
> > - I assume that the training is more important for the messages marked
> > with BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score:
> > 0.5998]; is this correct ?
> 
> Probably most important are cases where Bayes guessed wrong, rather than
> simply not being real sure.  Always train as ham or spam anything you see
> that Bayes decided to lean the other way.  This way it will get to know what
> is what for you.
> 
> Second most important would be training stuff that scores close to 50%.
> Personally I tend to dump most spam that scores less than about 80% into the
> spam training bucket.  Now and then I'll throw a handful of known ham in the
> ham bucket, to try to keep the number of learned ham/spam somewhat balaced.
> 
> 
> > - Shall I train as ham also the messages not marked as SPAM but having a
> > score close between 1/2 and 3/4 ? I mean, feeding also "normal" messages
> > into the system helps to have a good bayes filtering ?
> 
> I'm not absolutely sure what you are saying here.  If you are asking if you
> should train known ham as ham, the answer is yes.  Bayes needs to be able to
> decide which tokens are ham and which are spam.  It can only do this if it
> sees both ham and spam.  If you have ham that is hitting more than 20 or 30%
> you should certainly train it as ham.  However, even throwing ham that
> scores near 0 into training every so often is a good idea.
> 
>         Loren
> 
>

Re: bayes training question

Reply via email to