On Apr 21, 2008, at 8:40 AM, Chris St. Pierre wrote:
On Mon, 21 Apr 2008, Michael Parker wrote:

select * from bayes_vars;

...
2289 rows in set (0.00 sec)

What user do you run bayes under on your MXs?

I think you've found the issue.  We run as spamd.

# sa-learn -u spamd --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000          0    1492123          0  non-token data: nspam
0.000          0     660634          0  non-token data: nham
0.000          0   73178711          0  non-token data: ntokens
0.000          0 1189775610          0  non-token data: oldest atime
0.000          0 1208785034          0  non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count

That leads to two issues:

1.  I need to straighten things out and figure out why I've got a
strange mix of per-user and global data in my Bayes DB.  Whee.


You should use the bayes override username if you want global and then just sa-learn -u <username> clear everything else (PITA, I know). I personally don't believe individual bayes dbs are an issue, if you've got the space and CPU on your database machine. See below for some solutions.



2.  Does this mean that, if I use per-user Bayes, I have to run
expiration as each user individually?

Manual expiration was recommended to me a long time ago as a way to
increase database performance, but it seems like it may not be worth
it if I have to run N forced expirations, for potentially large values
of N.


This is true for DBM based bayes databases, but generally (with an exception I'll talk about in a second) MySQL based bayes expiration is very fast (just a few seconds). I would go ahead and turn auto-expire on, after running a manual expire to clear out the current backlog.

One reason that expiration slows down is an unoptimized db. I've found for my small uses if I run optimization every couple of weeks I get much better performance. It looks like you get a lot more traffic so I would recommend running it more often. With frequent optimizations and auto-expire your database will stay in much better shape.

Michael


Thanks for your help.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University


Reply via email to