RE: [Zope-dev] Segfault and Deadlock

2004-05-10 Thread alangmead




For another way round this issue of segfaults and deadlock when using
python 2.2, has anyone tried running Zope with a python built to use the
GNU Pth library instead of the system's pthread library?

GNU Pth is an entirely user-space library, so I would think it's behavior
would remain consistant  regardless of the system's thread implementation.
I'm not sure if the quality of that consistant level.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-06 Thread alangmead




I've submitted two patches to the python patch collector


is  something that should probably work with any pthreads based Unix
implementation. It simply unblocks
the type of signals that are normally delivered synchronously and that the
pthreads standard says should not be blocked.

Another patch,

redirects LinuxThreads asynchronous signals to Python's main thread. Right
now it is done
at compile time,  but I think I can change this to a runtime check.

As the patches are written,  I doubt they can both be applied onto a
standard Python. The purposes don't conflict, though and could probably
both be used.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-05 Thread alangmead




Carl Witty <[EMAIL PROTECTED]> wrote on 05/04/2004 08:18:52 PM:
> I don't think it should be tested for in configure (or at compile-time
> at all).  People will want to have binary distributions that work both
> with LinuxThreads and NPTL; some people actually switch back and forth
> on an application-by-application basis.  It would be much better to
> check at runtime.

You do have some good points. I did implement the compile time check,

<
http://sourceforge.net/tracker/index.php?func=detail&aid=948614&group_id=5470&atid=305470>

but I can see if I can rework it in a way that wouldn't adversely affect
other systems or NPTL systems.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-04 Thread Carl Witty
On Mon, 2004-05-03 at 15:57, [EMAIL PROTECTED] wrote:
> 
> 
> "Tim Peters" <[EMAIL PROTECTED]> wrote on 05/03/2004 04:41:08 PM:
> > [EMAIL PROTECTED]
> >
> > If someone cares enough to work up a patch, Python's patch tracker is
> open
> > all night:
> >
> > http://sf.net/tracker/?atid=305470&group_id=5470
> 
> I might be willing to try my hand at this, but I could use a tiny bit of
> guidance. (If you don't mind.)
> 
> It seems that the patch should only be activated for LinuxThreads, and
> should be tested for in configure.

I don't think it should be tested for in configure (or at compile-time
at all).  People will want to have binary distributions that work both
with LinuxThreads and NPTL; some people actually switch back and forth
on an application-by-application basis.  It would be much better to
check at runtime.

Carl Witty


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-04 Thread Tim Peters
As Andrew Langmead has already discovered, the LinuxThreads issue with
SIGSEGV was reported on the Python bug tracker almost a year ago (well,
reported, but not diagnosed):

SIGSEGV causes hung threads (Linux)
http://www.python.org/sf/756924

Looks like:

can't CNTRL-C when running os.system in a thread
http://www.python.org/sf/756940

is related.

python-dev'ers, do we have a release manager for 2.3.4 (I didn't see a
resolution to the brouhaha at the end of March)?  If so, is 2.3.4 still
planned for this month?

tick-tock-tick-tock-ing-ly y'rs  - tim


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread Tim Peters
[EMAIL PROTECTED]

[... snip good explanations ...]

> In order to get LinuxThreads to support the Python's threading
> semantics, what probably needs to be done is to have
> PyThread_init_thread set all handlers to call kill(main_thread, sig)
> to signal the main thread.

If someone cares enough to work up a patch, Python's patch tracker is open
all night:

http://sf.net/tracker/?atid=305470&group_id=5470


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread alangmead




"Tim Peters" <[EMAIL PROTECTED]> wrote on 05/03/2004 04:41:08 PM:
> [EMAIL PROTECTED]
>
> If someone cares enough to work up a patch, Python's patch tracker is
open
> all night:
>
> http://sf.net/tracker/?atid=305470&group_id=5470

I might be willing to try my hand at this, but I could use a tiny bit of
guidance. (If you don't mind.)

It seems that the patch should only be activated for LinuxThreads, and
should be tested for in configure.

Is it reasonable to test for a LinuxThreads specific function (like
pthread_kill_other_threads_np). Should I create a functional test that
test tries to cause  the LinuxThread specific behavior (cause a deadlock)
and the notice the problem and fix it.Should I use the glibc feature
"getconf GNU_LIBPTHREAD_VERSION"?

The first is easiest to test for, but seems a little error prone. (what if
someone else adds the non-standard function in order to ease porting from
Linux? What if someone comes up with a LinuxThreads update that solves this
problem?)  Its testing a feature that is related to the feature I want info
for, but not the troublesome behavior itself.

The second solution seems to be one step away from the halting problem
(although it might be able to be done with "block signal_a, send signal_a,
send signal_b, if signal_b is caught but not signal_a, then signals are not
rerouted across threads.)

The third option seems to be somewhere between the two (If getconf exists
and the symbol doesn't, then we have older linuxthreads. If the getconf
exists and the symbol returns linuxthreads, then we have newer
linuxthreads. Otherwise assume a compliant pthread.)


Is it reasonable to put a LinuxThreads specific replacement
SET_THREAD_SIGMASK  in thread_pthread.h? There are already a slew of
system specific defines, and the differences don't seem extreme enough to
make a separate thread_linuxthreads.h

This has, of course, long veered off from being about zope development, so
anyone wishing to contact me off list, feel free.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread Tim Peters
[EMAIL PROTECTED], on special-casing LinuxThreads]
> I might be willing to try my hand at this, but I could use a tiny bit of
> guidance. (If you don't mind.)

I don't mind , but I haven't run on Linux since 1994, and have lost
track of how Unixish special-casing is done in Python since then.  Best
advice is to start with a bug report on Python's bug tracker, and perhaps a
msg to mailto:[EMAIL PROTECTED]

I think Martin v. Löwis is currently most knowledgeable about messy config
issues in Python.

> It seems that the patch should only be activated for LinuxThreads, and
> should be tested for in configure.

Sounds plausible, but I wouldn't know.

> Is it reasonable to test for a LinuxThreads specific function (like
> pthread_kill_other_threads_np). Should I create a functional test that
> test tries to cause  the LinuxThread specific behavior (cause a deadlock)
> and the notice the problem and fix it.Should I use the glibc feature
> "getconf GNU_LIBPTHREAD_VERSION"?

I don't know what's available in LinuxThreads *to* test.  Most packages have
some God-awful preprocessor #define to key off of.  Also don't know whether
the specific breakage at issue here is unique to LinuxThreads.

> The first is easiest to test for, but seems a little error prone. (what if
> someone else adds the non-standard function in order to ease porting from
> Linux? What if someone comes up with a LinuxThreads update that solves
> this problem?)  Its testing a feature that is related to the feature I
> want info for, but not the troublesome behavior itself.

I expect that's why most people settle for testing a package-specific
#define.  It's also why there's always at least some resistance to patches
that do key off goofy symbols:  the #ifdef'ed code will probably remain
there forever, regardless of whether the problem still exists.  So:

> The second solution seems to be one step away from the halting problem
> (although it might be able to be done with "block signal_a, send signal_a,
> send signal_b, if signal_b is caught but not signal_a, then signals are
> not rerouted across threads.)

An autoconf-able test that checks for the actual bad behavior would be best.

...

> Is it reasonable to put a LinuxThreads specific replacement
> SET_THREAD_SIGMASK  in thread_pthread.h?

Yes.

> There are already a slew of system specific defines, and the
> differences don't seem extreme enough to make a separate
> thread_linuxthreads.h

Fully agreed.  LinuxThreads is primarily pthreads with a bug.  That makes it
qualitatively the same as all other pthreads implementations .


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread alangmead






"Tim Peters" <[EMAIL PROTECTED]> wrote on 05/03/2004 03:47:31 PM:
> [Dieter Maurer]
> I'm not clear on exactly what "blocked" means.

It has a very specific meaning with Unix signals. The kernel still has the
signal for the process waiting in a queue, but the  process has told the
kernel that it is interested in receiving it yet. Blocking is set by the
pthread_sigmask or the sigprocmask functions mentioned below.

>  The comments at the top of
> signalmodule.c say:
>
> ...
>
>When threads are supported, we want the following semantics:
>
>- only the main thread can set a signal handler
>- any thread can get a signal handler
>- signals are only delivered to the main thread
>
> ...
>
> That's the intent.
[stuff deleted]

For a POSIX compatible pthread library, Python's current implementation,
(set all signal handlers in the initial thread, start all subsequent
threads with signals blocked) will produce the intended Python threading
model behavior described above.

For LinuxThreads, blocked signals in threads is exactly where it is
imcompatible with POSIX. Since LinuxThreads are (not so) cleverly disguised
processes, each with their own PID, signals can be sent to a thread and if
blocked will never get rerouted to another thread. (When left to the
default signal handling is to terminate, and a thread is left to the
default the internal thread management will notice that one thread died of
a signal and will handle the rest.)

>
> > I verified that in the SIGSEGV case above, all remaining threads
> > had "SIGSEGV" blocked.
> >
> > I may try to change Python to not block SIGSEGV and see
> > whether we get again the old Python 2.1.3 behaviour.
>
> The relevant change is probably in Python/thread_pthread.h.  Guido added
a
> call to pthread_sigmask (or sigprocmask, depending on how broken the
> platform pthread support is ...),

In order to get LinuxThreads to support the Python's threading semantics,
what probably needs to be done is to have PyThread_init_thread set all
handlers to call kill(main_thread, sig)  to signal the main thread.




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread Tim Peters
[Dieter Maurer]
> The reason why I believe Python is to blame:

Then this should really move to a Python bug tracker.

>   With Python 2.1.3, a SIGSEGV in one thread killed them all;
>   with Python 2.3.3, a SIGSEGV in one thread kills one
>   of them (the main thread, not the thread that got the SIGSEGV)
>   but brings the others in a funny state.
>
>   This is on the same OS (Linux 2.4 kernel without NPTL).
>
>   Apparently, Python's handling of SIGSEGV signals
>   changed between 2.1.3 and 2.3.3.

SIGSEGV is mentioned only in Python's signalmodule.c.  You can use ViewCVS
to show a diff between the 2.1.3 state of that (tag r213) and current HEAD.
I don't see any possibly relevant differences:

http://cvs.sf.net/viewcvs.py/python/python/dist/src/Modules/signalmodule.c


> In an earlier post, someone reported that Python explicitely
> blocks most signals in non-main threads.

I'm not clear on exactly what "blocked" means.  The comments at the top of
signalmodule.c say:

...

   When threads are supported, we want the following semantics:

   - only the main thread can set a signal handler
   - any thread can get a signal handler
   - signals are only delivered to the main thread

...

That's the intent.

> I verified that in the SIGSEGV case above, all remaining threads
> had "SIGSEGV" blocked.
>
> I may try to change Python to not block SIGSEGV and see
> whether we get again the old Python 2.1.3 behaviour.

The relevant change is probably in Python/thread_pthread.h.  Guido added a
call to pthread_sigmask (or sigprocmask, depending on how broken the
platform pthread support is ...), to PyThread__init_thread(), in revision
2.33.  The checkin comment begins:

Add SF patch #468347 -- mask signals for non-main pthreads, by
Jason Lowe:

This patch updates Python/thread_pthread.h to mask all
signals for any thread created. This will keep all
signals masked for any thread that isn't the initial
thread.  For Solaris and Linux, the two platforms I was
able to test it on, it solves bug #465673 (pthreads
need signal protection) and probably will solve bug
#219772 (Interactive Interpreter+ Thread -> core dump
at exit).

That was added before 2.1.3, but looks like it didn't get backported to the
2.1.3 maintenance branch before 2.1.3 was released.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread alangmead






Dieter Maurer <[EMAIL PROTECTED]> wrote on 05/03/2004 01:48:57 PM:

> The reason why I believe Python is to blame:
>
>   With Python 2.1.3, a SIGSEGV in one thread killed them all;
>   with Python 2.3.3, a SIGSEGV in one thread kills one
>   of them (the main thread, not the thread that got the SIGSEGV)
>   but brings the others in a funny state.


You are right. This change:



causes new threads to be created with signals blocked. (the commit
messages, and a lot of the threading code in Python talk about "except for
the main thread." I'm not sure if Python's threading abstraction has any
concept of a main thread, but POSIX has none. All threads are peers.)

 discusses how
the POSIX spec defines asynchronous signals to be sent to "the process as a
whole", which runs afowl with the older Linux threading model, in which
threads are really cleverly disguised processes and each thread has a PID.

The switch in the signal handling between 2.1.3 and 2.3.3 (subsequent
threads after the initial thread are created with signals blocked)
explicitly triggers this LinuxThreads bug.



>
>   This is on the same OS (Linux 2.4 kernel without NPTL).
>
>   Apparently, Python's handling of SIGSEGV signals
>   changed between 2.1.3 and 2.3.3.
>
>
> In an earlier post, someone reported that Python explicitely
> blocks most signals in non-main threads.
> I verified that in the SIGSEGV case above, all remaining threads
> had "SIGSEGV" blocked.
>
> I may try to change Python to not block SIGSEGV and see
> whether we get again the old Python 2.1.3 behaviour.
>
> --
> Dieter
>
> ___
> Zope-Dev maillist  -  [EMAIL PROTECTED]
> http://mail.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope )


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread Dieter Maurer
Tim Peters wrote at 2004-5-2 23:16 -0400:
> ...
>Suppose a thread dies while holding the GIL (Python's global interpreter
>lock).  Will the GIL be released so that another thread (including the main
>thread) can continue?  There's no general answer to that.  I expect that
>under *most* platform threading implementations, all threads will be dead in
>the water then, because threads are intentionally (by the OS and C runtime)
>lightweight objects under most implementations, and don't save away enough
>info to make it *possible* for the platform thread runtime to recover
>gracefully in case of thread disaster.

That would not be necessary as long as all threads die.

The reason why I believe Python is to blame:

  With Python 2.1.3, a SIGSEGV in one thread killed them all;
  with Python 2.3.3, a SIGSEGV in one thread kills one
  of them (the main thread, not the thread that got the SIGSEGV)
  but brings the others in a funny state.

  This is on the same OS (Linux 2.4 kernel without NPTL).

  Apparently, Python's handling of SIGSEGV signals
  changed between 2.1.3 and 2.3.3.


In an earlier post, someone reported that Python explicitely
blocks most signals in non-main threads.
I verified that in the SIGSEGV case above, all remaining threads
had "SIGSEGV" blocked.

I may try to change Python to not block SIGSEGV and see
whether we get again the old Python 2.1.3 behaviour.

-- 
Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Segfault and Deadlock

2004-05-03 Thread alangmead






Dieter Maurer <[EMAIL PROTECTED]> wrote on 05/02/2004 01:28:48 PM:

> Willi Langenberger wrote at 2004-5-2 17:10 +0200:
>
> What is "NPTL"?

It stands for "Native POSIX Thread Library" It is a new threads subsystem
that is included in Linux 2.6 that Red Hat has backported into their 2.4
kernels. It has some performance advantages and has more correct POSIX
behavior (especially in terms of signal handling.) over the older
LinuxThreads system.

>
> >...
> >PS: A RedHat-9 system (kernel 2.4.20, with NPTL) shows a different
> >behaviour. After the segfault, all threads disappeared. So maybe
> >all is ok with NPTL, but i've not tested it yet...
>
> That is the good behaviour. Thus, we only have to learn
> how we can get "NPTL" for all Linux systems.
>

The choices seem to be to use a Linux 2.6 kernel, or to use a Red Hat 2.4
kernel with NPTL backported into it. (the earliest releases of Red Hat 9
had problems, but they seem to have been fixed in later kernel and  glibc
updates.)

The older LinuxThreads library has a non-standard threading function
pthread_kill_other_threads_np  that can be used as a workaround to notify
other threads of termination.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Segfault and Deadlock

2004-05-02 Thread Tim Peters
[EMAIL PROTECTED]
> Hi Zope (and Python) experts!
>
> There seems to be a problem when an external python module segfaults
> during a zope request. The remaining worker threads are deadlocked.

Maybe, maybe not.  Python (and so also Zope) use platform-native thread
facilities, and what happens when SIGSEGV gets signaled is mostly up to
them.  That's why you see different behavior, e.g., between Linux with NPTL
and Linux without NPTL:  the OS and C runtime determine exceptional thread
semantics, and Python isn't the operating system.

Suppose a thread dies while holding the GIL (Python's global interpreter
lock).  Will the GIL be released so that another thread (including the main
thread) can continue?  There's no general answer to that.  I expect that
under *most* platform threading implementations, all threads will be dead in
the water then, because threads are intentionally (by the OS and C runtime)
lightweight objects under most implementations, and don't save away enough
info to make it *possible* for the platform thread runtime to recover
gracefully in case of thread disaster.  The "natural" (least effort)
behavior is for the system to kill off the thread simply ignoring whatever
resources it may be holding.  In that case, all Python threads remaining
will hang forever waiting to acquire the GIL.

I expect the best that can be done, short of heroic effort (like writing
your own platform thread implementation), is to document what the various
thread implementations actually do.

> ...
> The reason is the way python handles threads on some systems
> (RedHat-7.3, kernel 2.4.20, without NPTL).

If you search the Python implementation, you'll find that there's nothing
different in what Python does depending on whether NPTL is present.  On any
system, all Python asks of the platform thread gimmicks is (a) a way to
start a thread, and (b) a way to implement Python lock semantics.  On any
POSIX system, #b is done with POSIX semaphores

#if defined(_POSIX_SEMAPHORES) && !defined(HAVE_BROKEN_POSIX_SEMAPHORES)

else #b is done with a combination of POSIX mutexes and POSIX condition
variables.  It could be that whether POSIX semaphores are available on Linux
depends on whether NPTL is in place -- I don't know.  But if so, that may be
the relevant difference.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Segfault and Deadlock

2004-05-02 Thread Jens Vagelpohl
Am 2. Mai 2004 um 13:28 schrieb Dieter Maurer:

Willi Langenberger wrote at 2004-5-2 17:10 +0200:
...
The reason is the way python handles threads on some systems
(RedHat-7.3, kernel 2.4.20, without NPTL).
What is "NPTL"?
The "native posix thread library" or something like that. It's a new 
threading implementation that was introduced with an update to RedHat9. 
Fedora Core hast it by default, as does RH Enterprise Server 3 I 
believe.

jens



smime.p7s
Description: S/MIME cryptographic signature
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Segfault and Deadlock

2004-05-02 Thread Willi Langenberger
According to Dieter Maurer:
> >The reason is the way python handles threads on some systems
> >(RedHat-7.3, kernel 2.4.20, without NPTL).
> 
> What is "NPTL"?

Native POSIX Thread Library.

> That is the good behaviour. Thus, we only have to learn
> how we can get "NPTL" for all Linux systems.

However, i dont know enough about NPTL. Only that it caused us some
grief when we migrated applications from RedHat-7.3 to RedHat-9 (we
had to set LD_ASSUME_KERNEL=2.4.1 for some applications [including
oracle] to work).

> By the way, nobody answered my problem report on comp.lang.python.
> Was maybe a bad time, during "Pycon".

Yes, i think it is more a python problem than a zope problem. But it
bites the Zope server on a linux system w/o NPTL. Maybe we have more
luck this time...


\wlang{}

-- 
[EMAIL PROTECTED]Fax: +43/1/31336/9207
Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Segfault and Deadlock

2004-05-02 Thread Dieter Maurer
Willi Langenberger wrote at 2004-5-2 17:10 +0200:
> ...
>The reason is the way python handles threads on some systems
>(RedHat-7.3, kernel 2.4.20, without NPTL).

What is "NPTL"?

>...
>PS: A RedHat-9 system (kernel 2.4.20, with NPTL) shows a different
>behaviour. After the segfault, all threads disappeared. So maybe
>all is ok with NPTL, but i've not tested it yet...

That is the good behaviour. Thus, we only have to learn
how we can get "NPTL" for all Linux systems.


By the way, nobody answered my problem report on comp.lang.python.
Was maybe a bad time, during "Pycon".

-- 
Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Segfault and Deadlock

2004-05-02 Thread Willi Langenberger
Hi Zope (and Python) experts!

There seems to be a problem when an external python module segfaults
during a zope request. The remaining worker threads are deadlocked.

I think this is the same problem as Dieter pointed out in his message
to zope-dev "[Problem] strange state after SIGSEGV":

  http://mail.zope.org/pipermail/zope-dev/2004-March/022092.html

The reason is the way python handles threads on some systems
(RedHat-7.3, kernel 2.4.20, without NPTL). I've written a small python
extension, which does nothing but segfault[1]. With this, i made the
following simulation, where one thread acquires a lock and segfaults:

  #!/usr/bin/env python2.3

  import thread
  import time
  import _segfault

  _lock = thread.allocate_lock()

  def worker():
  time.sleep(10)
  _lock.acquire()
  _segfault.segfault()
  _lock.release()

  thread.start_new_thread(worker, ())
  thread.start_new_thread(worker, ())
  thread.start_new_thread(worker, ())
  thread.start_new_thread(worker, ())

  time.sleep(3600)

  print 'Bye...'

On my RedHat-7.3 box (kernel 2.4.20-18, without NPTL) i get the
following behaviour. After starting the program, pstree shows this:

  bash(4103,wlang)---python2.3(4333)---python2.3(4334)-+-python2.3(4335)
   |-python2.3(4336)
   |-python2.3(4337)
   `-python2.3(4338)

After the 10 seconds sleep, one worker gets the lock, and
segfaults. After that, pstree shows this:

  init(1)-+-[...]
  |-python2.3(4336,wlang)
  |-python2.3(4337,wlang)
  |-python2.3(4338,wlang)

Three remaining worker threads (without main thread).

Gdb shows, that they wait for the lock (but they wont get it):

  (gdb) info stack
  #0  0x420293d5 in sigsuspend () from /lib/i686/libc.so.6
  #1  0x40031609 in __pthread_wait_for_restart_signal ()
 from /lib/i686/libpthread.so.0
  #2  0x4003272c in sem_wait@@GLIBC_2.1 () from /lib/i686/libpthread.so.0
  #3  0x080c7b2d in PyThread_acquire_lock (lock=0x8170728, waitflag=1)
^
  at Python/thread_pthread.h:406
  [...]

(On a side note, as python threads block all signals, these worker
threads cannot be stopped with SIGTERM. They must be killed with SIGKILL.)

All this has the consequences Dieter described:
>   Consequences:
> 
> *  Zope did no longer respond to requests
> 
> *  "stop" did not work (as "SIGTERM" was ineffective)
> 
> *  "start" did not work, as the dangling processes kept
>the HTTP port bound.

So i think i know what's happening, but i don't know how to fix it!
Can anyone help please? Any hints are highly appreciated!


\wlang{}

PS: A RedHat-9 system (kernel 2.4.20, with NPTL) shows a different
behaviour. After the segfault, all threads disappeared. So maybe
all is ok with NPTL, but i've not tested it yet...

[1] segfault module

-segfault.c---

void
segfault(void)
{
  char *x = 0;

  *x = 'a';
}

-segfault.i

%module segfault
%{
%}

void segfault(void);

-building:--

$ swig -python segfault.i
$ gcc -I/usr/local/include/python2.3 -c segfault_wrap.c -o segfault_wrap23.o
$ gcc -c -o segfault.o segfault.c
$ gcc -shared segfault_wrap23.o segfault.o -o _segfault.so

-- 
[EMAIL PROTECTED]Fax: +43/1/31336/9207
Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )