On 12/11/2015 06:28 AM, Marc Perkel wrote:

On 12/10/15 18:31, Benny Pedersen wrote:
Marc Perkel skrev den 2015-12-10 22:54:
I've had bayes disabled in SA because it seems to not be able to stay
working in a high volume situation. The MySQL server can't seem to
keep up with it even on very fast computers.

i got a palm Zire that can do ocr on handwrited text :=)

pretty good for the kind of cpu it have

But - thinking about trying something interesting - doing my own bayes
in a different way.

i have tryed bogofilter with very good succes, and i see problems with
bayes here aswell, i remember you changed to mariadb ?`

at that time you sayed it worked better then mysql ?

did it fail again ?

Here's my question.

Bayes breaks the message down into some sort of tokens and then does
statistics on those tokens as to tokens found in spam vs. tokens found
in ham.

But what about combinations of tokens? I'm thinking that I'd like to
have something that says when it sees tokens X and Y and Z then that's
spam even though X,Y,Z might be in ham when not combined.

Does bayes do that or is there anything that does?

if z is scored as spam, and x and y is ham, then its ham basicly that
how bayes works, but a single mail might be lots of digest to compare
for this to say spam or not

test bogofilter

put 1000000 spam mails in a spam folder
put 1000000 non spam mails in a ham folder

train bogofilter with this 2 folders in one go, not first ham and then
spam, it must be done in one bogofilter call train, configure
bogofilter.cf plugin for spamassassin, test it :=)

YMMV



Yes MariaDB was better than MySQL but not good enough to keep up. I even
tried putting the database on ram disk and still didn't work.

I'm thinking about incorporating Bogofilter but instead of feeding it
messages I'm thinking about feeding it the spamassassin results - the
rule names it hit + other data about the message and then let it score
the rules. That's what I want to experiment with.

Bogofilter was designed to be used with a MUA. Shellout for each msg can't be very efficient and if you want to share the Bayes DB across several boxes, NFS doesn't seem like a fast option either.

Again.. SA's Redis backend speed and ease of use can't be beat...
There's some help in
https://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/

Axb

Reply via email to