Re: RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss
Hi Tim, Tim Peters wrote at 2004-6-27 17:06 -0400: [Dieter Maurer] The problem occured in a ZEO client which called asyncore.poll in the forked subprocess. This poll deterministically stole ZEO server invalidation messages from the parent. I'm sorry, but this is still too vague to guess what happened. Even when I sometimes make errors, my responses usually contain all relevant information. - Which operating system was in use? The ZEO client application mentioned above is almost independent of the operating system -- beside the fact, that is uses fork (and therefore requires the OS to support it). Therefore, I did not mention that the application was running on Linux 2. - Which thread package? The application mentioned above does not use any thread. Therefore, it is independent of the thread package. Would it use threads it were LinuxThreads (but it does not). There is no mystery at all that the application lost ZEO server invalidation messages. It directly follows from the fork semantics with respect to file descriptors. The problem I saw for wider Zope/ZEO client usage came alone from reading the Linux fork manual page which indicates (or at least can be interpreted) that child and parent have the same threads. There was no concrete observation that messages are lost/duplicated in this szenario. Meanwhile, I checked that fork under Linux with LinuxThreads behaves with respect to threads as dictated by the POSIX standard: the forked process has a single thread and does not inherit other threads from its parent. I will soon check how our Solaris version of Python behaves. If this, too, has only one thread, I will apologize for the premature warning... - In the ZEO client that called fork(), did it call fork() directly, or indirectly as the result of a system() or popen() call? Or what? I'd like to understand a specific failure before rushing to generalization. The ZEO client as the basic structure: while 1: work_to_do = get_work(...) for work in work_to_do: pid = fork() if pid == 0: do_work(work) # will not return sleep(...) do_work opens a new ZEO connection. get_work and do_work use asyncore.poll to synchronize with incoming messages from ZEO -- no asyncore.mainloop around. The poll in do_work has stolen ZEO invalidation messages destined for the parent such that get_work has read old state and returned work items already completed. That is the problem I saw. All this is easy to understand, (almost) platform independent and independant of the thread library. *Iff* a thread library lets a forked child inherit all threads then the problem I announced in this Warning thread can occur, as it then behaves similarly to my application above (with an automatic rather than a explicit poll). It may well be that there is no thread library that does this. In your words: all thread implementations may be sane with respect to thread inheritance... -- Dieter ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss
[sathya] so can we safely assume that zeo does not mix the asyncore implementation with forks or threads and hence does not suffer from the child concurrently operating on sockets along with parent syndrome that dieter is experiencing ? appreciate any clarifications. It's normal for a ZEO application to run asyncore in its own thread. I don't really understand what Dieter is seeing, though: [Dieter] When a process forks the complete state, including file descriptors, threads and memory state is copied and the new process executes in this copied state. We now have 2 asyncore threads waiting for the same events. A problem is that it's *not* the case that a POSIX fork() clones all threads. Only the thread calling fork() exists in the child process. There's a brief but clear discussion of that here: http://www.opengroup.org/onlinepubs/009695399/functions/fork.html POSIX doesn't even have a way to *ask* that all threads be duplicated, for reasons explained there. Last I heard, Dieter was running LinuxThreads, which fail to meet the POSIX thread spec in several respects. But, AFAICT, fork() under LinuxThreads is the same as POSIX in this particular respect (since threads are distinct processes under LinuxThreads, it would be bizarre if a fork() cloned multiple processes!). I believe native Solaris threads act as Dieter describes, though (fork() clones all native Solaris threads). Dieter, can you clarify which OS(es) and thread package(s) you're using here? Do the things you're doing that call fork() (directly or indirectly) actually run from the thread running asyncore.loop()? That's the only way a POSIX fork() should end up with a clone of the thread running the asyncore loop. But then the subsequent exec (if you're doing system() or popen()) should wipe out the cloned asyncore code before the child process returns to asyncore. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss
Tim Peters wrote at 2004-6-27 04:46 -0400: ... [Dieter] When a process forks the complete state, including file descriptors, threads and memory state is copied and the new process executes in this copied state. We now have 2 asyncore threads waiting for the same events. A problem is that it's *not* the case that a POSIX fork() clones all threads. Only the thread calling fork() exists in the child process. There's a brief but clear discussion of that here: http://www.opengroup.org/onlinepubs/009695399/functions/fork.html POSIX doesn't even have a way to *ask* that all threads be duplicated, for reasons explained there. Last I heard, Dieter was running LinuxThreads, which fail to meet the POSIX thread spec in several respects. But, AFAICT, fork() under LinuxThreads is the same as POSIX in this particular respect (since threads are distinct processes under LinuxThreads, it would be bizarre if a fork() cloned multiple processes!). I believe native Solaris threads act as Dieter describes, though (fork() clones all native Solaris threads). Dieter, can you clarify which OS(es) and thread package(s) you're using here? Do the things you're doing that call fork() (directly or indirectly) actually run from the thread running asyncore.loop()? The problem occured in a ZEO client which called asyncore.poll in the forked subprocess. This poll deterministically stole ZEO server invalidation messages from the parent. I read the Linux fork manual page and found: fork creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited. ... The fork call conforms to SVr4, SVID, POSIX, X/OPEN, BSD 4.3 I concluded that if the only difference is in the PID/PPID and resource utilizations, there is no difference in the threads between parent and child. This would mean that the wide spread asyncore.mainloop threads could suffer the same message loss and message duplication. I did not observe a message loss/duplication in any application with an asyncore.mainloop thread. Maybe, the Linux fork manual page is only not precise with respect to threads and the problem does not occur in applications with a standard asyncore.mainloop thread. -- Dieter ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: RE: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss
[Dieter Maurer] The problem occured in a ZEO client which called asyncore.poll in the forked subprocess. This poll deterministically stole ZEO server invalidation messages from the parent. I'm sorry, but this is still too vague to guess what happened. - Which operating system was in use? - Which thread package? - In the ZEO client that called fork(), did it call fork() directly, or indirectly as the result of a system() or popen() call? Or what? I'd like to understand a specific failure before rushing to generalization. - In the ZEO client that called fork() (whether directly or indirectly), was fork called *from* the thread running ZEO's asyncore loop, or from a different thread? I read the Linux fork manual page and found: fork creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited. ... The fork call conforms to SVr4, SVID, POSIX, X/OPEN, BSD 4.3 If it conforms to POSIX (as it says it does), then fork() also has to satisfy the huge list of requirements I referenced before: http://www.opengroup.org/onlinepubs/009695399/functions/fork.html That page is the current POSIX spec for fork(). I concluded that if the only difference is in the PID/PPID and resource utilizations, there is no difference in the threads between parent and child. Except that if you're running non-POSIX LinuxThreads, a thread *is* a process (there's a one-to-one relationship under LinuxThreads, not the many-to-one relationship in POSIX), in which case no difference in threads is trivially true. This would mean that the wide spread asyncore.mainloop threads could suffer the same message loss and message duplication. That's why all sane wink threading implementations do what POSIX does on a fork(). fork() and threading don't really mix well under POSIX either, but the fork+exec model for starting a new process is an historical burden that bristles with subtle problems in a multithreaded world; POSIX introduced posix_spawn() and posix_spawnp() for sane(r) process creation, ironically moving closer to what most non-Unix systems have always done to create a new process. I did not observe a message loss/duplication in any application with an asyncore.mainloop thread. I don't understand. You said that you *have* seen message loss/duplication in a ZEO client, and I assume the ZEO client was running an asyncore thread. If so, then you have seen loss/duplication in an application with an asyncore thread. Or are you saying that you haven't seen loss/duplication under the specific Linux flavor whose man page you quoted, but have seen it under some other (so far unidentified) system? Maybe, the Linux fork manual page is only not precise with respect to threads and the problem does not occur in applications with a standard asyncore.mainloop thread. That fork manpage is clearly missing a mountain of crucial details (or it's not telling the truth about being POSIX-compliant). fork() is historically poorly documented, though. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss
Tim Peters wrote: hello tim, so can we safely assume that zeo does not mix the asyncore implementation with forks or threads and hence does not suffer from the child concurrently operating on sockets along with parent syndrome that dieter is experiencing ? appreciate any clarifications. Regards sathya [Dieter Maurer] ATTENTION: Crosspost -- Reply-To set to '[EMAIL PROTECTED]' Which I've honored. Today, I hit a nasty error. The error affects applications under Unix (and maybe Windows) which * use an asyncore mainloop thread (and maybe other asyncore applications) Zope and many ZEO clients belong to this class Note a possible complication: ZEO monkey-patches asyncore, replacing its loop() function with one of its own. This is done in ZODB's ThreadedAsync/LoopCallback.py. and * create subprocesses (via fork and system, popen or friends if they use fork internally (they do under Unix but I think not under Windows)). It may be an issue under Cygwin, but not under native Windows, which supports no way to clone a process; file descriptors may get inherited by child processes on Windows, but no code runs by magic. The error can cause non-deterministic loss of messages (HTTP requests, ZEO server responses, ...) destined for the parent process. It also can cause the same output to be send several times over sockets. The error is explained as follows: asyncore maintains a map from file descriptors to handlers. The asyncore main loop waits for any file descriptor to become active and then calls the corresponding handler. There's a key related point, though: asyncore.loop() terminates if it sees that the map has become empty. This appears to have consequences for the correctness of workarounds. For example, this is Python's current asyncore loop (the monkey-patched one ZEO installs is similar in this respect): def loop(timeout=30.0, use_poll=False, map=None): if map is None: map = socket_map if use_poll and hasattr(select, 'poll'): poll_fun = poll2 else: poll_fun = poll while map: poll_fun(timeout, map) If map becomes empty, loop() exits. When a process forks the complete state, including file descriptors, threads and memory state is copied and the new process executes in this copied state. We now have 2 asyncore threads waiting for the same events. Sam Rushing created asyncore as an alternative to threaded approaches; mixing asyncore with threads is a nightmare; throwing forks into the pot too is a good working definition of hell wink. File descriptors are shared between parent and child. When the child reads from a file descriptor from its parent, it steals the corresponding message: the message will not reach the parent. While file descriptors are shared, memory state is separate. Therefore, pending writes can be performed by both parent and child -- leading to duplicate writes to the same file descriptor. A workaround it to deactivate asyncore before forking (or system, popen, ...) and reactivate it afterwards: as exemplified in the following code: from asyncore import socket_map saved_socket_map = socket_map.copy() socket_map.clear() # deactivate asyncore As noted above, this may (or may not) cause asyncore.loop() to plain stop, in parent and/or in child process. If there aren't multiple threads, it's safe, but presumably you have multiple threads in mind, in which case behavior seems unpredictable (will the parent process's thread running asyncore.loop() notice that the map has become empty before the code below populates the map again? asyncore.loop() will or won't stop in the parent depending on that timing accident). pid = None try: pid = fork() if (pid == 0): # child # ... finally: if pid != 0: socket_map.update(saved_socket_map) # reactivate asyncore Another approach I've seen is to skip mucking with socket_map directly, and call asyncore.close_all() first thing in the child process. Of course that's vulnerable to vagaries of thread scheduling too, if asyncore is running in a thread other than the one doing the fork() call. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss
[Dieter Maurer] ATTENTION: Crosspost -- Reply-To set to '[EMAIL PROTECTED]' Which I've honored. Today, I hit a nasty error. The error affects applications under Unix (and maybe Windows) which * use an asyncore mainloop thread (and maybe other asyncore applications) Zope and many ZEO clients belong to this class Note a possible complication: ZEO monkey-patches asyncore, replacing its loop() function with one of its own. This is done in ZODB's ThreadedAsync/LoopCallback.py. and * create subprocesses (via fork and system, popen or friends if they use fork internally (they do under Unix but I think not under Windows)). It may be an issue under Cygwin, but not under native Windows, which supports no way to clone a process; file descriptors may get inherited by child processes on Windows, but no code runs by magic. The error can cause non-deterministic loss of messages (HTTP requests, ZEO server responses, ...) destined for the parent process. It also can cause the same output to be send several times over sockets. The error is explained as follows: asyncore maintains a map from file descriptors to handlers. The asyncore main loop waits for any file descriptor to become active and then calls the corresponding handler. There's a key related point, though: asyncore.loop() terminates if it sees that the map has become empty. This appears to have consequences for the correctness of workarounds. For example, this is Python's current asyncore loop (the monkey-patched one ZEO installs is similar in this respect): def loop(timeout=30.0, use_poll=False, map=None): if map is None: map = socket_map if use_poll and hasattr(select, 'poll'): poll_fun = poll2 else: poll_fun = poll while map: poll_fun(timeout, map) If map becomes empty, loop() exits. When a process forks the complete state, including file descriptors, threads and memory state is copied and the new process executes in this copied state. We now have 2 asyncore threads waiting for the same events. Sam Rushing created asyncore as an alternative to threaded approaches; mixing asyncore with threads is a nightmare; throwing forks into the pot too is a good working definition of hell wink. File descriptors are shared between parent and child. When the child reads from a file descriptor from its parent, it steals the corresponding message: the message will not reach the parent. While file descriptors are shared, memory state is separate. Therefore, pending writes can be performed by both parent and child -- leading to duplicate writes to the same file descriptor. A workaround it to deactivate asyncore before forking (or system, popen, ...) and reactivate it afterwards: as exemplified in the following code: from asyncore import socket_map saved_socket_map = socket_map.copy() socket_map.clear() # deactivate asyncore As noted above, this may (or may not) cause asyncore.loop() to plain stop, in parent and/or in child process. If there aren't multiple threads, it's safe, but presumably you have multiple threads in mind, in which case behavior seems unpredictable (will the parent process's thread running asyncore.loop() notice that the map has become empty before the code below populates the map again? asyncore.loop() will or won't stop in the parent depending on that timing accident). pid = None try: pid = fork() if (pid == 0): # child # ... finally: if pid != 0: socket_map.update(saved_socket_map) # reactivate asyncore Another approach I've seen is to skip mucking with socket_map directly, and call asyncore.close_all() first thing in the child process. Of course that's vulnerable to vagaries of thread scheduling too, if asyncore is running in a thread other than the one doing the fork() call. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )