I have been running a modified version of spamass-milter-0.3.1
(match_gecos, per-user rejection threshold). It worked fine in testing, but in production it jams up after a day or so. The milter continues to run, but sendmail cannot connect to it, logging
"error connecting to filter". Sometimes there a few messages
"Milter (spamassassin): to error state"
"milter_read(spamassassin): cmd read returned 0"
earlier, though the milter continues to operate for a while - maybe a couple of hours.

When I look at the processes, I see two or more copies of spamass-milter
in sleep (S) state as well as the parent in sleep (Ss1) state.

If I connect to one of the processes with gdb and do a backtrace, I typically see something like
 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
 in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
 in _L_mutex_lock_29 () from /lib/tls/libc.so.6
 in strdup () from /lib/tls/libc.so.6
 in SpamAssassin::Connect (this=0x8bb01f8) at spamass-milter.cpp:1506
 in mlfi_header ... at spamass-milter.cpp:1148
from which I assume that two threads have got in a deadlocked state.
Sometimes I see "debug" instead of "strdup".

I have tried replacing localtime() and strerror(), which are not threadsafe on Linux, with localtime_r and strerror_r(), but
that does not help.

Elsewhere on the Web I see a comment that mutex lock may be caused by calling malloc or printf inside a signal handler. I don't think spamass-milter is a signal handler, though strdup and vsyslog would call malloc and printf, so it's a not-impossible explanation. I had earlier seen mutex_lock called from strlwr, but have now replaced the complex tolower() call with a much simpler 7-bit ASCII routine.

The somewhat similar smf-clamd milter runs OK with no problem (similar in that it uses the same libraries and also passes mail to a daemon
for processing).

RHEL 4.3
sendmail-8.13.1-3.2.el4.i386
glibc-2.3.4-2.25.i686
kernel 2.6.9-34.0.1.ELsmp

(I doubt that my changes are directly responsible, bacause I've been playing with them without affecting the lock-up. Trying the stock milter on the production machine is an issue because the users expect their
whitelists to work based on match_gecos - [EMAIL PROTECTED]
-> user "juser")
--
Andrew Daviel, TRIUMF, Canada
Tel. +1 (604) 222-7376  (Pacific Time)
Network Security Manager


_______________________________________________
Spamass-milt-list mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/spamass-milt-list

Reply via email to