Re: RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss

2004-06-28 Thread Dieter Maurer
Hi Tim,

Tim Peters wrote at 2004-6-27 17:06 -0400:
[Dieter Maurer]
 The problem occured in a ZEO client which called asyncore.poll
 in the forked subprocess. This poll deterministically
 stole ZEO server invalidation messages from the parent.

I'm sorry, but this is still too vague to guess what happened.

Even when I sometimes make errors, my responses usually contain
all relevant information.

- Which operating system was in use?

The ZEO client application mentioned above is almost independent
of the operating system -- beside the fact, that is uses
fork (and therefore requires the OS to support it).

Therefore, I did not mention that the application was running
on Linux 2.

- Which thread package?

The application mentioned above does not use any thread.
Therefore, it is independent of the thread package.
Would it use threads it were LinuxThreads (but it does not).


There is no mystery at all that the application lost ZEO server
invalidation messages. It directly follows from the fork
semantics with respect to file descriptors.


The problem I saw for wider Zope/ZEO client usage came alone
from reading the Linux fork manual page which indicates
(or at least can be interpreted) that child and parent have the same threads.
There was no concrete observation that messages are lost/duplicated
in this szenario.


Meanwhile, I checked that fork under Linux with LinuxThreads
behaves with respect to threads as dictated by the POSIX
standard: the forked process has a single thread and
does not inherit other threads from its parent.

I will soon check how our Solaris version of Python behaves.
If this, too, has only one thread, I will apologize for
the premature warning...


- In the ZEO client that called fork(), did it call fork() directly, or
 indirectly as the result of a system() or popen() call?  Or what?
 I'd like to understand a specific failure before rushing to
 generalization.

The ZEO client as the basic structure:

while 1:
  work_to_do = get_work(...)
  for work in work_to_do:
  pid = fork()
  if pid == 0:
 do_work(work)
 # will not return
  sleep(...)

do_work opens a new ZEO connection.
get_work and do_work use asyncore.poll to
synchronize with incoming messages from ZEO -- no asyncore.mainloop
around.
The poll in do_work has stolen ZEO invalidation messages
destined for the parent such that get_work has read old state
and returned work items already completed. That is the problem
I saw.

All this is easy to understand, (almost) platform independent
and independant of the thread library.


*Iff* a thread library lets a forked child inherit all threads
then the problem I announced in this Warning thread can
occur, as it then behaves similarly to my application
above (with an automatic rather than a explicit poll).

It may well be that there is no thread library that does this.
In your words: all thread implementations may be sane
with respect to thread inheritance...

-- 
Dieter
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss

2004-06-27 Thread Tim Peters
[sathya]
 so can we safely assume that zeo does not mix the asyncore implementation
 with  forks or threads and hence does not suffer from the child
 concurrently operating on sockets along with parent syndrome that
 dieter is experiencing ? appreciate any clarifications.

It's normal for a ZEO application to run asyncore in its own thread.  I
don't really understand what Dieter is seeing, though:

[Dieter]
   When a process forks the complete state, including file descriptors,
   threads and memory state is copied and the new process
   executes in this copied state.
   We now have 2 asyncore threads waiting for the same events.

A problem is that it's *not* the case that a POSIX fork() clones all
threads.  Only the thread calling fork() exists in the child process.
There's a brief but clear discussion of that here:

http://www.opengroup.org/onlinepubs/009695399/functions/fork.html

POSIX doesn't even have a way to *ask* that all threads be duplicated, for
reasons explained there.

Last I heard, Dieter was running LinuxThreads, which fail to meet the POSIX
thread spec in several respects.  But, AFAICT, fork() under LinuxThreads is
the same as POSIX in this particular respect (since threads are distinct
processes under LinuxThreads, it would be bizarre if a fork() cloned
multiple processes!).  I believe native Solaris threads act as Dieter
describes, though (fork() clones all native Solaris threads).

Dieter, can you clarify which OS(es) and thread package(s) you're using
here?  Do the things you're doing that call fork() (directly or indirectly)
actually run from the thread running asyncore.loop()?  That's the only way a
POSIX fork() should end up with a clone of the thread running the asyncore
loop.  But then the subsequent exec (if you're doing system() or popen())
should wipe out the cloned asyncore code before the child process returns to
asyncore.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss

2004-06-27 Thread Dieter Maurer
Tim Peters wrote at 2004-6-27 04:46 -0400:
 ...
[Dieter]
   When a process forks the complete state, including file descriptors,
   threads and memory state is copied and the new process
   executes in this copied state.
   We now have 2 asyncore threads waiting for the same events.

A problem is that it's *not* the case that a POSIX fork() clones all
threads.  Only the thread calling fork() exists in the child process.
There's a brief but clear discussion of that here:

http://www.opengroup.org/onlinepubs/009695399/functions/fork.html

POSIX doesn't even have a way to *ask* that all threads be duplicated, for
reasons explained there.

Last I heard, Dieter was running LinuxThreads, which fail to meet the POSIX
thread spec in several respects.  But, AFAICT, fork() under LinuxThreads is
the same as POSIX in this particular respect (since threads are distinct
processes under LinuxThreads, it would be bizarre if a fork() cloned
multiple processes!).  I believe native Solaris threads act as Dieter
describes, though (fork() clones all native Solaris threads).

Dieter, can you clarify which OS(es) and thread package(s) you're using
here?
 Do the things you're doing that call fork() (directly or indirectly)
actually run from the thread running asyncore.loop()?

The problem occured in a ZEO client which called asyncore.poll
in the forked subprocess. This poll deterministically
stole ZEO server invalidation messages from the parent.

I read the Linux fork manual page and found:

  fork creates a child process that differs from the parent process
  only in its PID and PPID, and in the fact that resource utilizations
  are set to 0. File locks and pending signals are not inherited.

  ...

  The fork call conforms to SVr4, SVID, POSIX, X/OPEN, BSD 4.3


I concluded that if the only difference is in the PID/PPID
and resource utilizations,
there is no difference in the threads between parent and child.
This would mean that
the wide spread asyncore.mainloop threads could suffer
the same message loss and message duplication.

I did not observe a message loss/duplication in any
application with an asyncore.mainloop thread.


Maybe, the Linux fork manual page is only not precise with respect
to threads and the problem does not occur in applications
with a standard asyncore.mainloop thread.


-- 
Dieter
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss

2004-06-27 Thread Tim Peters
[Dieter Maurer]
 The problem occured in a ZEO client which called asyncore.poll
 in the forked subprocess. This poll deterministically
 stole ZEO server invalidation messages from the parent.

I'm sorry, but this is still too vague to guess what happened.

- Which operating system was in use?

- Which thread package?

- In the ZEO client that called fork(), did it call fork() directly, or
 indirectly as the result of a system() or popen() call?  Or what?
 I'd like to understand a specific failure before rushing to
 generalization.

- In the ZEO client that called fork() (whether directly or indirectly),
 was fork called *from* the thread running ZEO's asyncore loop,
 or from a different thread?

 I read the Linux fork manual page and found:
 
  fork creates a child process that differs from the parent process
  only in its PID and PPID, and in the fact that resource utilizations
  are set to 0. File locks and pending signals are not inherited.
 
  ...
 
  The fork call conforms to SVr4, SVID, POSIX, X/OPEN, BSD 4.3

If it conforms to POSIX (as it says it does), then fork() also has to
satisfy the huge list of requirements I referenced before:

   http://www.opengroup.org/onlinepubs/009695399/functions/fork.html

That page is the current POSIX spec for fork().

 I concluded that if the only difference is in the PID/PPID
 and resource utilizations, there is no difference in the threads between parent
 and child.  

Except that if you're running non-POSIX LinuxThreads, a thread *is* a
process (there's a one-to-one relationship under LinuxThreads, not the
many-to-one relationship in POSIX), in which case no difference in
threads is trivially true.

 This would mean that the wide spread asyncore.mainloop threads could suffer
 the same message loss and message duplication.

That's why all sane wink threading implementations do what POSIX
does on a fork().  fork() and threading don't really mix well under
POSIX either, but the fork+exec model for starting a new process is
an historical burden that bristles with subtle problems in a
multithreaded world; POSIX introduced posix_spawn() and posix_spawnp()
for sane(r) process creation, ironically moving closer to what most
non-Unix systems have always done to create a new process.

 I did not observe a message loss/duplication in any
 application with an asyncore.mainloop thread.

I don't understand.  You said that you *have* seen message
loss/duplication in a ZEO client, and I assume the ZEO client was
running an asyncore thread.  If so, then you have seen
loss/duplication in an application with an asyncore thread.

Or are you saying that you haven't seen loss/duplication under the
specific Linux flavor whose man page you quoted, but have seen it
under some other (so far unidentified) system?

 Maybe, the Linux fork manual page is only not precise with respect
 to threads and the problem does not occur in applications
 with a standard asyncore.mainloop thread.

That fork manpage is clearly missing a mountain of crucial details
(or it's not telling the truth about being POSIX-compliant).  fork()
is historically poorly documented, though.
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss

2004-06-26 Thread sathya
Tim Peters wrote:
hello tim,
so can we safely assume that zeo does not mix the asyncore 
implementation with  forks or threads and hence does not suffer from the 
child  concurrently operating on sockets along with parent syndrome 
that dieter is experiencing ?
appreciate any clarifications.
Regards
sathya
[Dieter Maurer]
ATTENTION: Crosspost -- Reply-To set to '[EMAIL PROTECTED]'

Which I've honored.

Today, I hit a nasty error.
The error affects applications under Unix (and maybe Windows) which
 *  use an asyncore mainloop thread (and maybe other asyncore
applications)
Zope and many ZEO clients belong to this class

Note a possible complication:  ZEO monkey-patches asyncore, replacing its
loop() function with one of its own.  This is done in ZODB's
ThreadedAsync/LoopCallback.py.

and
 *  create subprocesses (via fork and system, popen or friends if
they use fork internally (they do under Unix but I think not
under Windows)).

It may be an issue under Cygwin, but not under native Windows, which
supports no way to clone a process; file descriptors may get inherited by
child processes on Windows, but no code runs by magic.

The error can cause non-deterministic loss of messages (HTTP requests,
ZEO server responses, ...) destined for the parent process. It also can
cause the same output to be send several times over sockets.
The error is explained as follows:
 asyncore maintains a map from file descriptors to handlers.
 The asyncore main loop waits for any file descriptor to
 become active and then calls the corresponding handler.

There's a key related point, though:  asyncore.loop() terminates if it sees
that the map has become empty.  This appears to have consequences for the
correctness of workarounds.  For example, this is Python's current asyncore
loop (the monkey-patched one ZEO installs is similar in this respect):
def loop(timeout=30.0, use_poll=False, map=None):
if map is None:
map = socket_map
if use_poll and hasattr(select, 'poll'):
poll_fun = poll2
else:
poll_fun = poll
while map:
poll_fun(timeout, map)
If map becomes empty, loop() exits.

 When a process forks the complete state, including file descriptors,
 threads and memory state is copied and the new process
 executes in this copied state.
 We now have 2 asyncore threads waiting for the same events.

Sam Rushing created asyncore as an alternative to threaded approaches;
mixing asyncore with threads is a nightmare; throwing forks into the pot too
is a good working definition of hell wink.

 File descriptors are shared between parent and child.
 When the child reads from a file descriptor from its parent,
 it steals the corresponding message: the message will
 not reach the parent.
 While file descriptors are shared, memory state is separate.
 Therefore, pending writes can be performed by both
 parent and child -- leading to duplicate writes to the same
 file descriptor.
A workaround it to deactivate asyncore before forking (or system,
popen, ...) and reactivate it afterwards: as exemplified in the
following code:
from asyncore import socket_map
saved_socket_map = socket_map.copy()
socket_map.clear() # deactivate asyncore

As noted above, this may (or may not) cause asyncore.loop() to plain stop,
in parent and/or in child process.  If there aren't multiple threads, it's
safe, but presumably you have multiple threads in mind, in which case
behavior seems unpredictable (will the parent process's thread running
asyncore.loop() notice that the map has become empty before the code below
populates the map again?  asyncore.loop() will or won't stop in the parent
depending on that timing accident).

pid = None
try:
pid = fork()
 if (pid == 0):
 # child
 # ...
finally:
if pid != 0:
 socket_map.update(saved_socket_map) # reactivate asyncore

Another approach I've seen is to skip mucking with socket_map directly, and
call asyncore.close_all() first thing in the child process.  Of course
that's vulnerable to vagaries of thread scheduling too, if asyncore is
running in a thread other than the one doing the fork() call.
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss

2004-06-25 Thread Tim Peters
[Dieter Maurer]
 ATTENTION: Crosspost -- Reply-To set to '[EMAIL PROTECTED]'

Which I've honored.

 Today, I hit a nasty error.

 The error affects applications under Unix (and maybe Windows) which

   *  use an asyncore mainloop thread (and maybe other asyncore
  applications)

  Zope and many ZEO clients belong to this class

Note a possible complication:  ZEO monkey-patches asyncore, replacing its
loop() function with one of its own.  This is done in ZODB's
ThreadedAsync/LoopCallback.py.

 and

   *  create subprocesses (via fork and system, popen or friends if
  they use fork internally (they do under Unix but I think not
  under Windows)).

It may be an issue under Cygwin, but not under native Windows, which
supports no way to clone a process; file descriptors may get inherited by
child processes on Windows, but no code runs by magic.

 The error can cause non-deterministic loss of messages (HTTP requests,
 ZEO server responses, ...) destined for the parent process. It also can
 cause the same output to be send several times over sockets.

 The error is explained as follows:

   asyncore maintains a map from file descriptors to handlers.
   The asyncore main loop waits for any file descriptor to
   become active and then calls the corresponding handler.

There's a key related point, though:  asyncore.loop() terminates if it sees
that the map has become empty.  This appears to have consequences for the
correctness of workarounds.  For example, this is Python's current asyncore
loop (the monkey-patched one ZEO installs is similar in this respect):

def loop(timeout=30.0, use_poll=False, map=None):
if map is None:
map = socket_map

if use_poll and hasattr(select, 'poll'):
poll_fun = poll2
else:
poll_fun = poll

while map:
poll_fun(timeout, map)

If map becomes empty, loop() exits.


   When a process forks the complete state, including file descriptors,
   threads and memory state is copied and the new process
   executes in this copied state.
   We now have 2 asyncore threads waiting for the same events.

Sam Rushing created asyncore as an alternative to threaded approaches;
mixing asyncore with threads is a nightmare; throwing forks into the pot too
is a good working definition of hell wink.

   File descriptors are shared between parent and child.
   When the child reads from a file descriptor from its parent,
   it steals the corresponding message: the message will
   not reach the parent.

   While file descriptors are shared, memory state is separate.
   Therefore, pending writes can be performed by both
   parent and child -- leading to duplicate writes to the same
   file descriptor.


 A workaround it to deactivate asyncore before forking (or system,
 popen, ...) and reactivate it afterwards: as exemplified in the
 following code:

  from asyncore import socket_map
  saved_socket_map = socket_map.copy()
  socket_map.clear() # deactivate asyncore

As noted above, this may (or may not) cause asyncore.loop() to plain stop,
in parent and/or in child process.  If there aren't multiple threads, it's
safe, but presumably you have multiple threads in mind, in which case
behavior seems unpredictable (will the parent process's thread running
asyncore.loop() notice that the map has become empty before the code below
populates the map again?  asyncore.loop() will or won't stop in the parent
depending on that timing accident).

  pid = None
  try:
  pid = fork()
if (pid == 0):
# child
# ...
  finally:
  if pid != 0:
socket_map.update(saved_socket_map) # reactivate asyncore

Another approach I've seen is to skip mucking with socket_map directly, and
call asyncore.close_all() first thing in the child process.  Of course
that's vulnerable to vagaries of thread scheduling too, if asyncore is
running in a thread other than the one doing the fork() call.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )