Linda Walsh wrote:

Matt Kettler wrote:
The fact that they keep laying around is a problem. This suggests SA keeps getting killed before the expire can complete. Do you have any kind of limits set such as CPU time or memory that SA might be running against and dying?

You can try kicking off an expire manually using sa-learn --force-expire. (add -D if you want some debug output).. note: this could run for a long time, particularly if bayes_toks is really large.
----
    Another one of the fileles appeared -- 17M long.
while bayes_toks is 8.8M.
That's quite reasonable. Should be able to crank through that reasonably quick. It strikes me as odd that the .expire got larger than the toks file, but I might be missing something about how the process works.

auto-whitelist is 78M -- that seems a bit excessive...
Yeah, the AWL has no expire process, other than manually running the check-whitelist script on it. (found at: http://svn.apache.org/repos/asf/spamassassin/branches/3.2/tools/ )


Don't know what really large means -- bayes_toks isn't that large
compared to some of the other files.  No limits that I know of...
Ahh...seeing some oddness in the log though:
(Interrupted?    timeouts?...weird...)...

Jul 25 15:24:48 Ishtar spamd[2443]: bayes: cannot open bayes databases /home/user/.spamassassin/bayes_* R/W: lock failed: Interrupted system call Jul 25 15:28:21 Ishtar spamd[2355]: bayes: expire_old_tokens: child processing timeout at /usr/bin/spamd line 1085, <GEN6> line 22.
The R/W lock fails are not surprising at all. If an expire process is running, other scanners will fail to get a write-lock on the bayes DB. That's not really that big a deal, all it means is that autolearning can't occur during the expire run. Rather than logjam your mail queue, SA just moves on and skips the autolearning, treating it as better to process mail in a timely fashion than to wait to perform every automatic learning possible. Same thing happens when two message try to autolearn at the same time, only one gets the R/W lock, and the other moves on..

Now if it can't get a R/O lock (read only), start to worry. But R/W lock failures happen fairly often.

The child processing timeouts are a bigger deal, that's what's causing all the expire files to be left around. I can only guess this is caused by the fact that --timeout-child defaults to 300 seconds. However, I wouldn't expect that to apply to expire runs... Of course I'm no expert here, as I don't use spamd (I use MailScanner, which loads the API directly).

What version are you running? reading around the child processing timeout seems to have been a common problem in the 3.1.x series, but I've not seen it reported in the 3.2.x series.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4650

Reply via email to