Theo Van Dinter wrote on Tue, 10 Feb 2004 11:44:24 -0500:

> FYI: For 3.0.0, I just put in some code that stops this kind of thing from
> happening (if the calculated message atime is determined to be more than
> 1 day in the future, it just uses the current time() value instead).
> If a 2.64 release happens, the fix will probably go in there too:
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3025
>

I think I'm hitting the same problem:

debug: bayes: found bayes db version 2
debug: bayes: expiry check keep size, 75% of max: 112500
debug: bayes: token count: 638040, final goal reduction size: 525540
debug: bayes: First pass?  Current: 1076602270, Last: 1076601983, atime: 0, 
count: 0, newdelta: 0, ratio: 0
debug: bayes: Can't use estimation method for expiry, something fishy, 
calculating optimal atime delta (first pass)

If I understand correctly the database should have only 112500 (must be the 
2.63 default), so it's been failing for quite some time if it's now at over 
600.000.

The token reduction count stays at

debug: bayes: 43200     637929
debug: bayes: 22118400  637929

so, it would expire almost everything.
What does this mean? That most tokens are within the same time range or that 
most tokens are way too old ??? How can I figure this out?
This is a db which started around summer/autumn last year with some learning 
and is continually growing since then, with around 17.000 spam and 3.000 ham 
at the moment. I'm not sure what the next means, does it help to better 
understand the above?

0.000          0     -17982          0  non-token data: newest atime
0.000          0 1076601982          0  non-token data: last journal sync 
atime
0.000          0 1076602431          0  non-token data: last expiry atime

I "fixed" this now by setting
bayes_expiry_max_db_size 1000000

Is there a way I can sanitize the db? I don't really want to throw it away.

The interesting thing is that I have this problem on two machines but it was 
detectable only on one of them. We use a milter (MailCorral) which hands the 
mail over to spamd. The timeout for that is 60 seconds. I didn't note any 
increase in spam or other problems on that machine. Since MailCorral isn't 
actively developed anymore I'm looking for alternatives and set up 
MailScanner + SA on another machine, copied the old Bayes and other SA stuff 
over and keep sending a small portion of the spamtrap spam we get directly 
to that machine. Almost immediately I had a lot of SA time-outs and 
searching the list I finally found the articles about the "fishy" atime 
delta. MailScanner uses a smaller time-out by default, I think 20 seconds or 
so, that's still unchanged yet. So, one could imagine that the problem 
wasn't detected because the longer time-out allowed for finishing the 
hanging expiry. However, this doesn't seem to be the case. Most of the time 
the spamd result comes after a few seconds. I'm not seeing much if any spamd 
time-outs in the logs of the first machine. Is there something different 
between spamd and sa, so that the problem would exist but only visually 
emerge with SA but not with spamd? Like that spamd isn't trying the 
auto-expire with every message but just once a day while it happens with 
each invocation of spamassassin?


Kai

-- 

Kai Sch�tzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org



Reply via email to