Michal Jeczalik wrote:
On Mon, 12 Mar 2007, Daryl C. W. O'Shea wrote:
after upgrading from 3.1.7 I have numerous problems with my spamd. It
hangs up during high load and become permamently unresponsive.
According to advices I have found on devel list, I'm using
--round-robin now and it hangs less often. But now I have a lot of
~/.spamassassin/bayes_toks.expire[pid] lockfiles, that don't
disappear and quickly foul user's quota. It's interesting that on
another host with similar load conditions everything works ok. Anyway
- am I the only one experiencing these problems? There's no rumour on
the devel list, there's no rumour here - what's wrong? :) In this
situation 3.1.8 is quite unusable for me and I'm thinking about
downgrade. The only reason I have not done it already is that I'm not
sure if this is a simple task - my users won't stand another
spamassassin blackout, after numerous spam floods due to those
hang-ups in past couple of days. ;-)
This has nothing to do with 3.1.8 specifically. The same thing would
happen with 3.1.7. Reverting to an earlier SA version will do nothing
for you.
spamd isn't "hanging up", it's doing bayes expiries, as you can tell
from having the bayes_toks.expire* lock files left after you kill off
the child process(es) doing the expiry. Since you're killing off the
expiries before they complete, this will (of course) keep happening.
If your system is too loaded to deal with bayes auto expiries, disable
bayes_auto_expire and then schedule them to be done via a cron job
using sa-learn --force-expire -u username.
BTW - if it hangs up, it hangs up *completely* until I restart it. If it
goes down at midnight, then spamd is unresposive until 8am when I get up
and do something. There are no log messages during this period. It's
*dead* in the full meaning of this word. :) So I'm not so sure as you
that it's only a matter of auto expire - would a single autoexpire task
lock up a frontend process for so long?!
If it's as busy as you said it was, "hangs up during high load", and
all/most of the children are trying to do expiries it could take months
to complete -- especially if you don't have the physical memory to do it
(read a whole lot of RAM if multiple expiries are happening).
Disable auto expiry, do serialized expiries via cron, and see if the
problem stops. Actually, you don't even need to do the expries to stop
the problem, just disable auto expiries. If spamd stops "hanging" then
it's the auto expiries causing the problem.
Experience tells me that if the spamd children are actually using CPU
time and they're not spewing errors all over your syslog, then it's an
expiry issue.
Daryl