Jeff Mincy wrote:
> I agree with everything you wrote but only when bayes autolearning is
> turned off.  Bayes learning holds an exclusive lock to the bayes
> database particularly during expiration.

But the example was calling spamc.  Bayes autolearning would be
occuring in the spamd side of things.  The spamc shouldn't need to
know about it.  The spamd side worries about that.  That is rather the
entire point of using the client-server model.  Otherwise one would
simply run the full perl spamassassin there instead.  (There are other
reasons for the client server too.  And yet more for running the full
perl spamassassin inline.  There is no canonical correct way.)  For my
personal mail I run the full perl spamassassin.  For mailing lists I
run it through spamc-spamd.

And as John noted it is much better to run sa-learn --expire as a
separate process, probably cron driven, and not inline with the SA run.

> If spamc does bayes autolearning and starts an expiration then other
> spamc runs for that user will be locked out of bayes.  At some point
> you start getting timeouts at different points in the email delivery
> chain.

Any time supply (spamd) can't keep up with demand (spamc) there may be
timeouts and other failures.  The question is where might those occur.
In the suggested recipe a timeout between spamc and spamd would cause
the spamc to exit with EX_TEMPFAIL (75) which would cause procmail to
exit the same which would cause the MTA to requeue and retry the
message later.

For spamc-spamd use I pump the mail off through spamc and let spamd
queue and process as fast as it can.  In my environment I am not
experiencing timeouts.  But if for some reason the resources of supply
did not keep up with demand then the message would simply queue for
retry later when resources may be available.  If the system was
overloaded then that is about the best that can be done anyway.  If
that happened often then increasing the compute resources on the spamd
side would allow it to keep up better.

Serializing spamc will definitely give spamd plenty of time between
messages so that the system won't be overloaded.  But if the system is
dedicated to handling mail and anti-spam then it won't be able to be
highly utilized.  Running more in parallel will usually utilize
resources more efficiently.

In my environment spamc and spamd are on different systems.  Therefore
making use of parallel compute resources is a good thing.  But if in
your environment everything happens on one single system then
serialization may be best.  It is your judgement call and every
environment is different.

> I have a separate sa-learn (or spamc -L) procmail recipe that has a
> serialization lock.

I run sa-learn --expire from cron.  I run spamc --learntype=spam
otherwise using a different invocation not involving procmail.

Bob

Reply via email to