Re: bayes, numbers of tokens and performance

Kevin Parris Fri, 19 Mar 2010 08:05:23 -0700

It doesn't really work that way.  Bayes is just one part of the picture and in 
order to get good results you have to turn the full toolkit loose on the 
problem; I'm not sure Bayes by itself should be expected to achieve 95% 
recognition anyway.  The main flaw in your current plan is that once you 
re-activate the BLs then your Bayes content will begin to get stale - and 
effectiveness is likely then to decline over time.  Bayes tends to work better 
when trained continuously on current traffic.  Rather than stop using other 
tools, just to get some spam to train with, perhaps you should focus more on 
training Bayes more actively with the spam that gets through otherwise.

You're not likely ever to detect ALL the spam traffic, no matter what 
combination of tools you deploy - there will always be clever spammers working 
on ways to bypass them.

>>> tonjg <t...@freeuk.com> 03/18/10 11:04 AM >>>

Matus UHLAR - fantomas wrote:
> 
>> DNS available?
>> no
> 
> well, why? DNS helps very much for catching spam. all blacklists use DNS
> (afaik)

sorry, when you said dns I didn't know you were referring to the dnsbl's. I
know the black lists are excellent for filtering spam but I've got those
switched off so I can actually accumulate some spam for the sa-learn. I
figured if I get spamassassin working really well first (ie: a 95% success
rate) I would then switch the bl's back on and use both.

Re: bayes, numbers of tokens and performance

Reply via email to