Michal Jeczalik wrote:
On Mon, 12 Mar 2007, Daryl C. W. O'Shea wrote:

after upgrading from 3.1.7 I have numerous problems with my spamd. It hangs up during high load and become permamently unresponsive. According to advices I have found on devel list, I'm using --round-robin now and it hangs less often. But now I have a lot of ~/.spamassassin/bayes_toks.expire[pid] lockfiles, that don't disappear and quickly foul user's quota. It's interesting that on another host with similar load conditions everything works ok. Anyway - am I the only one experiencing these problems? There's no rumour on the devel list, there's no rumour here - what's wrong? :) In this situation 3.1.8 is quite unusable for me and I'm thinking about downgrade. The only reason I have not done it already is that I'm not sure if this is a simple task - my users won't stand another spamassassin blackout, after numerous spam floods due to those hang-ups in past couple of days. ;-)

This has nothing to do with 3.1.8 specifically. The same thing would happen with 3.1.7. Reverting to an earlier SA version will do nothing for you.

spamd isn't "hanging up", it's doing bayes expiries, as you can tell from having the bayes_toks.expire* lock files left after you kill off the child process(es) doing the expiry. Since you're killing off the expiries before they complete, this will (of course) keep happening.

If your system is too loaded to deal with bayes auto expiries, disable bayes_auto_expire and then schedule them to be done via a cron job using sa-learn --force-expire -u username.

BTW - if it hangs up, it hangs up *completely* until I restart it. If it goes down at midnight, then spamd is unresposive until 8am when I get up and do something. There are no log messages during this period. It's *dead* in the full meaning of this word. :) So I'm not so sure as you that it's only a matter of auto expire - would a single autoexpire task lock up a frontend process for so long?!

If it's as busy as you said it was, "hangs up during high load", and all/most of the children are trying to do expiries it could take months to complete -- especially if you don't have the physical memory to do it (read a whole lot of RAM if multiple expiries are happening).

Disable auto expiry, do serialized expiries via cron, and see if the problem stops. Actually, you don't even need to do the expries to stop the problem, just disable auto expiries. If spamd stops "hanging" then it's the auto expiries causing the problem.

Experience tells me that if the spamd children are actually using CPU time and they're not spewing errors all over your syslog, then it's an expiry issue.


Daryl

Reply via email to