Thank you sir :) -----Original Message----- From: Bowie Bailey [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 22, 2006 4:35 PM To: users@spamassassin.apache.org Subject: RE: Bayes Advise and Question ?
Vahric MUHTARYAN wrote: > > I red some articules about bayes and something is not clear for me > and I need spamassassin people advises > > I'm using spamassassin rules , some SARE rules , razor and I'm happy > with spam detection . First I think that I should disable > autolearning and manuelly train bayes Quite a few people will tell you that this is the best method, but if you do manual training, you have to keep training it. You are never finished with the training because the spams keep changing. > but after some read I saw that > some commercial products said that bayes must train min 2 week also > default spamassassin manner is 200 ham 200 spam messages. Before , I > think that setting ham and spam too low and train spamassassin only > with spam mails which is not detected by spamassassin ... is it right > idea ? No, you have to train with both spam and ham so that Bayes can learn to tell the difference. > but I saw that I have to train with ham and spams together > because same words can past on spam mails or on ham mails .... What > do you advise ? Should I train bayes manuelly or automaticly with > giving long time for trust bayes ! That is debatable. I would suggest that you train it manually with every email that comes through your system for a while. Once you get to 200 ham and 200 spam and it starts working for you, you can switch to either automatic learning, or continue manual learning with just the messages that are scored wrong. > My system spam score threshold is 4.5 then it's seems to be > "bayes_auto_learn_threshold_spam" must be setted to 4.5 right ? and > if I set it to 4.5 then what will be the header and body % for > working ?! No, those are two separate settings. The spam threshold (required_hits) is the number of points needed before SpamAssassin will mark a message as spam. Bayes_auto_learn_threshold_spam is the number of points needed before Bayes will learn a message as spam. This should be higher than your required hits to avoid learning false positives as spam. Unless you have a reason to distrust the default setting, I wouldn't change it. Bayes_auto_learn_threshold_nonspam is the maximum score for a message that Bayes learns as ham (or nonspam). This defaults to 0.1, but some people suggest that you should drop it to 0 or even -0.1 to avoid learning false negatives. > And I guess if system didn't catch 3 header 3 body requriment then I > have to train system manully right ? Right. > Anybody using Journal for bayes learning , it's solving about file > locking , I think locking is not issue for who is using database > enviroment right ? I don't think locking is an issue if you are using mysql or another DB to hold the bayes database. But then, I'm not using a database myself, so I'm probably not the right person to answer this question. > Which way we have to choose for using bayes_learning, database or > file ? We are handling more then 500,000 mail day ! Database is probably the way to go for that volume. I didn't set it up that way because I don't have nearly that volume and I didn't want to go through the hassle of setting it up. -- Bowie