Changes by Greg Brockman g...@mit.edu:
--
nosy: +gdb
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8296
___
___
Python-bugs-list mailing list
Greg Brockman g...@mit.edu added the comment:
Hmm, a few notes. I have a bunch of nitpicks, but those can wait for a later
iteration. (Just one style nit: I noticed a few unneeded whitespace changes...
please try not to do that, as it makes the patch harder to read.)
- Am I correct that you
Greg Brockman g...@mit.edu added the comment:
Ah, you're right--sorry, I had misread your code. I hadn't noticed
the usage of the worker_pids. This explains what you're doing with
the ACKs. Now, the problem is, I think doing it this way introduces
some races (which is why I introduced the ACK
Greg Brockman g...@mit.deu added the comment:
Thanks for looking at it! Basically this patch requires the parent process to
be able to send a message to a particular worker. As far as I can tell, the
existing queues allow the children to send a message to the parent, or the
parent to send
Greg Brockman g...@mit.deu added the comment:
I'll take another stab at this. In the attachment (assign-tasks.patch), I've
combined a lot of the ideas presented on this issue, so thank you both for your
input. Anyway:
- The basic idea of the patch is to record the mapping of tasks
New submission from Greg Brockman g...@mit.deu:
Upon os.fork(), pending signals are inherited by the child process. This can
be demonstrated by pressing C-c in the middle of the
following program:
import os, sys, time, threading
def do_fork():
while True:
if not os.fork
Greg Brockman g...@ksplice.com added the comment:
You can't have a sensible default timeout, because the worker may be
processing something important...
In my case, the jobs are either functional or idempotent anyway, so aborting
halfway through isn't a problem. In general though, I'm
Greg Brockman g...@ksplice.com added the comment:
Thanks for the comment. It's good to know what constraints we have to deal
with.
we can not, however, change the API.
Does this include adding optional arguments?
--
___
Python tracker rep
Changes by Greg Brockman g...@ksplice.com:
--
nosy: +gdb
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9334
___
___
Python-bugs-list mailing list
Greg Brockman g...@ksplice.com added the comment:
I thought the EOF errors would take care of that, at least this has
been running in production on many platforms without that happening.
There are a lot of corner cases here, some more pedantic than others. For
example, suppose a child dies
Greg Brockman g...@ksplice.com added the comment:
At first glance, looks like there are a number of sites where you don't change
the blocking calls to non-blocking calls (e.g. get()). Almost all of the
get()s have the potential to be called when there is no possibility for them to
terminate
Greg Brockman g...@ksplice.com added the comment:
Before I forget, looks like we also need to deal with the
result from a worker being un-unpickleable:
This is what my patch in bug 9244 does...
Really? I could be misremembering, but I believe you deal with the case of the
result being
Greg Brockman g...@ksplice.com added the comment:
Actually, the program you demonstrate is nonequivalent to the one I posted.
The one I posted pickles just fine because 'bar' is a global name, but doesn't
unpickle because it doesn't exist in the parent's namespace. (See
http
Greg Brockman g...@ksplice.com added the comment:
Started looking at your patch. It seems to behave reasonably, although it
still doesn't catch all of the failure cases. In particular, as you note,
crashed jobs won't be noticed until the pool shuts down... but if you make a
blocking call
Greg Brockman g...@ksplice.com added the comment:
Before I forget, looks like we also need to deal with the result from a worker
being un-unpickleable:
#!/usr/bin/env python
import multiprocessing
def foo(x):
global bar
def bar(x):
pass
return bar
p = multiprocessing.Pool(1)
p.apply
Greg Brockman g...@ksplice.com added the comment:
What kind of errors are you having that makes the get() call fail?
Try running the script I posted. It will fail with an AttributeError (raised
during unpickling) and hang.
I'll note that the particular issues that I've run into in practice
Greg Brockman g...@ksplice.com added the comment:
This looks pretty reasonable to my untrained eye. I successfully applied and
ran the test suite.
To be clear, the errback change and the unpickleable result change are actually
orthogonal, right? Anyway, I'm not really familiar
Greg Brockman g...@ksplice.com added the comment:
While looking at your patch in issue 9244, I realized that my code fails to
handle an unpickleable task, as in:
#!/usr/bin/env python
import multiprocessing
foo = lambda x: x
p = multiprocessing.Pool(1)
p.apply(foo, [1])
This should be fixed
Greg Brockman g...@ksplice.com added the comment:
With pool.py:272 commented out, running about 50k iterations, I saw 4
tracebacks giving an exception on pool.py:152. So this seems to imply the race
does exist (i.e. that the thread is in _maintain_pool rather than time.sleep
when shutdown
Greg Brockman g...@ksplice.com added the comment:
Thanks much for taking a look at this!
why are you terminating the second pass after finding a failed
process?
Unfortunately, if you've lost a worker, you are no longer guaranteed that cache
will eventually be empty. In particular, you may
Greg Brockman g...@ksplice.com added the comment:
For processes disappearing (if that can at all happen), we could solve
that by storing the jobs a process has accepted (started working on),
so if a worker process is lost, we can mark them as failed too.
Sure, this would be reasonable
Greg Brockman g...@ksplice.com added the comment:
Cool, thanks. I'll note that with this patch applied, using the test program
from 9207 I consistently get the following exception:
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last
Greg Brockman g...@ksplice.com added the comment:
What about just catching the exception? See e.g. the attached patch.
(Disclaimer: not heavily tested).
--
Added file: http://bugs.python.org/file17934/shutdown.patch
___
Python tracker rep
Greg Brockman g...@ksplice.com added the comment:
For what it's worth, I think I have a simpler reproducer of this issue. Using
freshly-compiled python-from-trunk (as well as multiprocessing-from-trunk), I
get tracebacks from the following about 30% of the time:
import multiprocessing, time
Greg Brockman g...@ksplice.com added the comment:
I'm on Ubuntu 10.04, 64 bit.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4106
New submission from Greg Brockman g...@ksplice.com:
I have recently begun using multiprocessing for a variety of batch
jobs. It's a great library, and it's been quite useful. However, I have been
bitten several times by situations where a worker process in a Pool will
unexpectedly die
New submission from Greg Brockman g...@ksplice.com:
On Ubuntu 10.04, using freshly-compiled python-from-trunk (as well as
multiprocessing-from-trunk), I get tracebacks from the following about 30% of
the time:
import multiprocessing, time
def foo(x
Greg Brockman g...@ksplice.com added the comment:
Sure thing. See http://bugs.python.org/issue9207.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4106
Greg Brockman g...@ksplice.com added the comment:
That's likely a mistake on my part. I'm not observing this using the stock
version of multiprocessing on my Ubuntu machine(after running O(100) times). I
do, however, observe it when using either python2.7 or python2.6 with
multiprocessing
Greg Brockman g...@ksplice.com added the comment:
No, I'm not using the Google code backport.
To be clear, I've tried testing this with two versions of multiprocessing:
- multiprocessing-from-trunk (r82645): I get these exceptions with ~40%
frequency
- multiprocessing from Ubuntu 10.04
Greg Brockman g...@ksplice.com added the comment:
Wait - so, you are pulling svn trunk, compiling and running your test
with the built python executable?
Yes. I initially observed this issue while using 10.04's Python (2.6.5), but
wanted to make sure it wasn't fixed by using a newer
Greg Brockman g...@ksplice.com added the comment:
Yeah, I've just taken a checkout from trunk, ran './configure make make
install', and reproduced on:
- Ubuntu 10.04 32-bit
- Ubuntu 9.04 32-bit
--
___
Python tracker rep...@bugs.python.org
http
Greg Brockman g...@ksplice.com added the comment:
With the line commented out, I no longer see any exceptions.
Although, if I understand what's going on, there still a (much rarer)
possibility of an exception, right? I guess in the common case, the
worker_handler is in the sleep when
33 matches
Mail list logo