I am currently running SpamAssassin 2.60 and I did migrate the bayes
database from 2.55 according to the instructions. However, I still have
problems with SA getting timeouts while using the Bayes database, and
someone else had earlier concluded that the Bayes checks get really slow
when there are too many tokens in the database.

So I started investigating a bit, and now it seems like the expiration does
not work properly. The default max token limit is set to 150000, and my
database is currently up to 161438 tokens. When running an "sa-learn
--force-expire -D" I see the following lines:

debug: bayes: expiry check keep size, 75% of max: 112500
debug: bayes: token count: 161438, final goal reduction size: 48938
debug: bayes: First pass?  Current: 1065075477, Last: 1065043573, atime: 0,
count: 0, newdelta: 0, ratio: 0
debug: bayes: something fishy, calculating atime (first pass)
debug: bayes: couldn't find a good delta atime, need more token difference,
skipping expire.
debug: Syncing complete.

It seems like the expiry code is having some kinds of problems here. The
database has been slowly accumulated from a live feed of emails for a
duration of several weeks. Here is the beginning of "sa-learn --dump" :

0.000          0          2          0  non-token data: bayes db version
0.000          0       1912          0  non-token data: nspam
0.000          0       2623          0  non-token data: nham
0.000          0     161438          0  non-token data: ntokens
0.000          0 1063730702          0  non-token data: oldest atime
0.000          0 1065073789          0  non-token data: newest atime
0.000          0 1065075452          0  non-token data: last journal sync
atime
0.000          0 1065075609          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime
delta
0.000          0          0          0  non-token data: last expire
reduction count

Can anybody shed some light on whether this is correct behaviour or not? My
main problem is slowness when using the Bayes checks with timeouts
occurring, and it might have to do with too large a database. Any other
explanations are also welcome!

Regards,
        Kai



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to