In the last episode (Jun 19), Andrew Daviel said: > When I look at the processes, I see two or more copies of spamass-milter > in sleep (S) state as well as the parent in sleep (Ss1) state. > > If I connect to one of the processes with gdb and do a backtrace, I > typically see something like > in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 > in __lll_mutex_lock_wait () from /lib/tls/libc.so.6 > in _L_mutex_lock_29 () from /lib/tls/libc.so.6 > in strdup () from /lib/tls/libc.so.6 > in SpamAssassin::Connect (this=0x8bb01f8) at spamass-milter.cpp:1506 > in mlfi_header ... at spamass-milter.cpp:1148 > from which I assume that two threads have got in a deadlocked state. > Sometimes I see "debug" instead of "strdup".
Try running "thread apply all bt" to get stack traces of all threads at once. If it's really a deadlock, you should see a least one other hung thread with a different stack trace. > I have tried replacing localtime() and strerror(), which are not > threadsafe on Linux, with localtime_r and strerror_r(), but that > does not help. Even though they're not threadsafe they won't cause deadlocks. You'll just get the wrong time or the wrong error message. localtime is only used here to convert the current time, and strerror won't get called unless there's already something else wrong. > Elsewhere on the Web I see a comment that mutex lock may be caused > by calling malloc or printf inside a signal handler. I don't think > spamass-milter is a signal handler, though strdup and vsyslog would > call malloc and printf, so it's a not-impossible explanation. I had > earlier seen mutex_lock called from strlwr, but have now replaced > the complex tolower() call with a much simpler 7-bit ASCII routine. spamass-milter doesn't do any signal handling, so if it is a deadlock on a malloc mutex, it might be within libmilter itself. libmilter only traps TERM, HUP, and INT, so under normal operation it shouldn't be in a signal handler. > The somewhat similar smf-clamd milter runs OK with no problem > (similar in that it uses the same libraries and also passes mail to > a daemon for processing). > > RHEL 4.3 > sendmail-8.13.1-3.2.el4.i386 > glibc-2.3.4-2.25.i686 > kernel 2.6.9-34.0.1.ELsmp > > (I doubt that my changes are directly responsible, bacause I've been playing > with them without affecting the lock-up. Trying the stock milter on the > production machine is an issue because the users expect their > whitelists to work based on match_gecos - [EMAIL PROTECTED] > -> user "juser") > -- > Andrew Daviel, TRIUMF, Canada > Tel. +1 (604) 222-7376 (Pacific Time) > Network Security Manager > > > _______________________________________________ > Spamass-milt-list mailing list > [email protected] > http://lists.nongnu.org/mailman/listinfo/spamass-milt-list -- Dan Nelson [EMAIL PROTECTED] _______________________________________________ Spamass-milt-list mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/spamass-milt-list
