Thank very much Loren. regards, mizzio
Il giorno lun, 23-05-2005 alle 04:51 -0700, Loren Wilton ha scritto: > > - I get some messages marked as SPAM coming form this mailing list, > > since the body contains URLs and text from real spam messages: do I have > > to feed them in my DB as ham or this can cause some kind of bayes > > poisoning ? > > The best thing is to avoid having the mail from this list go through SA. > There are various ways to do this, depending on your mail setup. > > > > - I assume that the training is more important for the messages marked > > with BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: > > 0.5998]; is this correct ? > > Probably most important are cases where Bayes guessed wrong, rather than > simply not being real sure. Always train as ham or spam anything you see > that Bayes decided to lean the other way. This way it will get to know what > is what for you. > > Second most important would be training stuff that scores close to 50%. > Personally I tend to dump most spam that scores less than about 80% into the > spam training bucket. Now and then I'll throw a handful of known ham in the > ham bucket, to try to keep the number of learned ham/spam somewhat balaced. > > > > - Shall I train as ham also the messages not marked as SPAM but having a > > score close between 1/2 and 3/4 ? I mean, feeding also "normal" messages > > into the system helps to have a good bayes filtering ? > > I'm not absolutely sure what you are saying here. If you are asking if you > should train known ham as ham, the answer is yes. Bayes needs to be able to > decide which tokens are ham and which are spam. It can only do this if it > sees both ham and spam. If you have ham that is hitting more than 20 or 30% > you should certainly train it as ham. However, even throwing ham that > scores near 0 into training every so often is a good idea. > > Loren > >