RE: Bayes Advise and Question ?

Vahric MUHTARYAN Wed, 22 Feb 2006 06:56:23 -0800

Thank you sir :) 

-----Original Message-----
From: Bowie Bailey [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 22, 2006 4:35 PM
To: users@spamassassin.apache.org
Subject: RE: Bayes Advise and Question ?

Vahric MUHTARYAN wrote:
> 
> I red some articules about bayes and something is not clear for me
> and I need spamassassin people advises 
> 
> I'm using spamassassin rules , some SARE rules , razor and I'm happy
> with spam detection . First I think that I should disable
> autolearning and manuelly train bayes 

Quite a few people will tell you that this is the best method, but
if you do manual training, you have to keep training it.  You are never
finished with the training because the spams keep changing.

> but after some read I saw that
> some commercial products said that bayes must train min 2 week also
> default spamassassin manner is 200 ham 200 spam messages.  Before , I
> think that setting ham and spam too low and train spamassassin only
> with spam mails which is not detected by spamassassin ... is it right
> idea ?

No, you have to train with both spam and ham so that Bayes can learn to
tell the difference.

>  but I saw that I have to train with ham and spams together
> because same words can past on spam mails or on ham mails .... What
> do you advise ? Should I train bayes manuelly or automaticly with
> giving long time for trust bayes !          

That is debatable.  I would suggest that you train it manually with
every email that comes through your system for a while.  Once you get to
200 ham and 200 spam and it starts working for you, you can switch to
either automatic learning, or continue manual learning with just the
messages that are scored wrong.

> My system spam score threshold is 4.5 then it's seems to be
> "bayes_auto_learn_threshold_spam" must be setted to 4.5 right ? and
> if I set it to 4.5 then what will be the header and body % for
> working ?!   

No, those are two separate settings.

The spam threshold (required_hits) is the number of points needed before
SpamAssassin will mark a message as spam.

Bayes_auto_learn_threshold_spam is the number of points needed before
Bayes will learn a message as spam.  This should be higher than your
required hits to avoid learning false positives as spam.  Unless you
have a reason to distrust the default setting, I wouldn't change it.

Bayes_auto_learn_threshold_nonspam is the maximum score for a message
that Bayes learns as ham (or nonspam).  This defaults to 0.1, but some
people suggest that you should drop it to 0 or even -0.1 to avoid
learning false negatives.

> And I guess if system didn't catch 3 header 3 body requriment then I
> have to train system manully right ? 

Right.

> Anybody using Journal for bayes learning , it's solving about file
> locking , I think locking is not issue for who is using database
> enviroment right ?  

I don't think locking is an issue if you are using mysql or another DB
to hold the bayes database.  But then, I'm not using a database myself,
so I'm probably not the right person to answer this question.

> Which way we have to choose for using bayes_learning, database or
> file ? We are handling more then 500,000 mail day ! 

Database is probably the way to go for that volume.  I didn't set it up
that way because I don't have nearly that volume and I didn't want to go
through the hassle of setting it up.

-- 
Bowie

RE: Bayes Advise and Question ?

Reply via email to