Hi Zope (and Python) experts! There seems to be a problem when an external python module segfaults during a zope request. The remaining worker threads are deadlocked.
I think this is the same problem as Dieter pointed out in his message to zope-dev "[Problem] strange state after SIGSEGV": http://mail.zope.org/pipermail/zope-dev/2004-March/022092.html The reason is the way python handles threads on some systems (RedHat-7.3, kernel 2.4.20, without NPTL). I've written a small python extension, which does nothing but segfault[1]. With this, i made the following simulation, where one thread acquires a lock and segfaults: #!/usr/bin/env python2.3 import thread import time import _segfault _lock = thread.allocate_lock() def worker(): time.sleep(10) _lock.acquire() _segfault.segfault() _lock.release() thread.start_new_thread(worker, ()) thread.start_new_thread(worker, ()) thread.start_new_thread(worker, ()) thread.start_new_thread(worker, ()) time.sleep(3600) print 'Bye...' On my RedHat-7.3 box (kernel 2.4.20-18, without NPTL) i get the following behaviour. After starting the program, pstree shows this: bash(4103,wlang)---python2.3(4333)---python2.3(4334)-+-python2.3(4335) |-python2.3(4336) |-python2.3(4337) `-python2.3(4338) After the 10 seconds sleep, one worker gets the lock, and segfaults. After that, pstree shows this: init(1)-+-[...] |-python2.3(4336,wlang) |-python2.3(4337,wlang) |-python2.3(4338,wlang) Three remaining worker threads (without main thread). Gdb shows, that they wait for the lock (but they wont get it): (gdb) info stack #0 0x420293d5 in sigsuspend () from /lib/i686/libc.so.6 #1 0x40031609 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x4003272c in sem_wait@@GLIBC_2.1 () from /lib/i686/libpthread.so.0 #3 0x080c7b2d in PyThread_acquire_lock (lock=0x8170728, waitflag=1) ^^^^^^^^^^^^^^^^^^^^^ at Python/thread_pthread.h:406 [...] (On a side note, as python threads block all signals, these worker threads cannot be stopped with SIGTERM. They must be killed with SIGKILL.) All this has the consequences Dieter described: > Consequences: > > * Zope did no longer respond to requests > > * "stop" did not work (as "SIGTERM" was ineffective) > > * "start" did not work, as the dangling processes kept > the HTTP port bound. So i think i know what's happening, but i don't know how to fix it! Can anyone help please? Any hints are highly appreciated! \wlang{} PS: A RedHat-9 system (kernel 2.4.20, with NPTL) shows a different behaviour. After the segfault, all threads disappeared. So maybe all is ok with NPTL, but i've not tested it yet... [1] segfault module -segfault.c--------------- void segfault(void) { char *x = 0; *x = 'a'; } -segfault.i---------------- %module segfault %{ %} void segfault(void); -building:------------------ $ swig -python segfault.i $ gcc -I/usr/local/include/python2.3 -c segfault_wrap.c -o segfault_wrap23.o $ gcc -c -o segfault.o segfault.c $ gcc -shared segfault_wrap23.o segfault.o -o _segfault.so -- [EMAIL PROTECTED] Fax: +43/1/31336/9207 Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria _______________________________________________ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )