Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-12 Thread Charles R Harris
On Thu, Dec 11, 2008 at 8:20 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accu

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Robert Kern
On Thu, Dec 11, 2008 at 13:06, Sturla Molden wrote: > >> Create RandomState objects and use those. This is a best practice >> whether you are using multiprocessing or not. The module-level >> functions really should only be used for noodling around in IPython. > > Are we guaranteed that two Random

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
> Create RandomState objects and use those. This is a best practice > whether you are using multiprocessing or not. The module-level > functions really should only be used for noodling around in IPython. Are we guaranteed that two RandomStates will produce two independent sequences? If not, Rando

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
In the docs I found this: "We used a hypothesis that a set of PRNGs based on linear recurrences is mutually 'independent' if the characteristic polynomials are relatively prime to each other. There is no rigorous proof of this hypothesis..." S.M. > Here is the c program and the description

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Robert Kern
On Thu, Dec 11, 2008 at 07:20, Gael Varoquaux wrote: > The take home message is: > **call 'numpy.random.seed()' when you are using multiprocessing** Create RandomState objects and use those. This is a best practice whether you are using multiprocessing or not. The module-level functions really s

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread josef . pktd
Here is the c program and the description how to implement independent Mersenne Twister PRNGs by the inventor(s) of Mersenne Twister: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html I didn't see a license statement. Josef ___ Numpy-discussio

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
I'd just like to add that yet another option would be to use the manager/proxy object in multiprocessing. In this case numpy.random.random will be called in the parent process. I have not used this and I am not sure how efficient it is. But the possibility is there. Sturla Molden === tes

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread josef . pktd
> >> Is the goal to parallelize a big sampler into N tasks of M trials, to >> produce the same result as a sequential set of M*N trials ? Then it does >> sound like a trivial task at all. I know there exists libraries >> explicitly designed for parallel random number generation - maybe this >> is w

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread David Cournapeau
On Fri, Dec 12, 2008 at 2:49 AM, Gael Varoquaux wrote: > On Fri, Dec 12, 2008 at 02:29:55AM +0900, David Cournapeau wrote: >> The seed could be explicitly set in each task, no ? > >> def task(x): >> np.random.seed() >> return np.random.random(x) > > Yes. The problem is trivial to solve, on

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread David Cournapeau
On Fri, Dec 12, 2008 at 3:00 AM, Bruce Southey wrote: > David Cournapeau wrote: >> Sturla Molden wrote: >> >>> On 12/11/2008 6:10 PM, Michael Gilbert wrote: >>> >>> >>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent this kind of error? A simple enough solution would

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
On 12/11/2008 6:29 PM, David Cournapeau wrote: > def task(x): > np.random.seed() > return np.random.random(x) > > But does this really make sense ? Hard to say... There is a chance of this producing indentical or overlapping sequences, albeit unlikely. I would not do this. I'd make one

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Bruce Southey
David Cournapeau wrote: > Sturla Molden wrote: > >> On 12/11/2008 6:10 PM, Michael Gilbert wrote: >> >> >> >>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent >>> this kind of error? A simple enough solution would be to also include >>> the process id as part of the

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Gael Varoquaux
On Fri, Dec 12, 2008 at 02:29:55AM +0900, David Cournapeau wrote: > The seed could be explicitly set in each task, no ? > def task(x): > np.random.seed() > return np.random.random(x) Yes. The problem is trivial to solve, once you are aware of it. Just like the integer division problems we

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread David Cournapeau
Sturla Molden wrote: > On 12/11/2008 6:10 PM, Michael Gilbert wrote: > > >> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent >> this kind of error? A simple enough solution would be to also include >> the process id as part of the seed >> > > It would not help, as the s

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
On 12/11/2008 6:21 PM, Sturla Molden wrote: > It would not help, as the seeding is done prior to forking. > > I am mostly familiar with Windows programming. But what is needed is a > fork handler (similar to a system hook in Windows jargon) that sets a > new seed in the child process. Actually

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
On 12/11/2008 6:10 PM, Michael Gilbert wrote: > Shouldn't numpy (and/or multiprocessing) be smart enough to prevent > this kind of error? A simple enough solution would be to also include > the process id as part of the seed It would not help, as the seeding is done prior to forking. I am most

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread David Cournapeau
Michael Gilbert wrote: >> Exactly, change task_helper.py to >> >> >> import numpy as np >> >> def task(x): >> import os >> print "Hi, I'm", os.getpid() >> return np.random.random(x) >> >> >> and note the output >> >> >> Hi, I'm 16197 >> Hi, I'm 16198 >> Hi, I'm 16199 >> H

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Michael Gilbert
> Exactly, change task_helper.py to > > > import numpy as np > > def task(x): > import os > print "Hi, I'm", os.getpid() > return np.random.random(x) > > > and note the output > > > Hi, I'm 16197 > Hi, I'm 16198 > Hi, I'm 16199 > Hi, I'm 16199 > [ 0.58175647 0.16293922

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Pauli Virtanen
Thu, 11 Dec 2008 17:55:58 +0100, Sturla Molden wrote: [clip] > Sure, a pool is fine. I was just speculating that one of the four > processes in your pool was idle all the time; i.e. that one of the other > three got to do the task twice. Therefore you only got three identical > results and not four

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Gael Varoquaux
On Thu, Dec 11, 2008 at 05:55:58PM +0100, Sturla Molden wrote: > > No, Pool is what I want, because in my production code I am submitting > > jobs to that pool. > Sure, a pool is fine. I was just speculating that one of the four > processes in your pool was idle all the time; i.e. that one of the

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
On 12/11/2008 5:39 PM, Gael Varoquaux wrote: >>> Why do you say the results are the same ? They don't look the same to >>> me - only the first three are the same. > >> He used the multiprocessing.Pool object. There is a possible race >> condition here: one or more of the forked processes may be

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Bruce Southey
Gael Varoquaux wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accumulating false results. Given > two files: > > === test.py === > fr

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Gael Varoquaux
On Thu, Dec 11, 2008 at 05:36:47PM +0100, Gael Varoquaux wrote: > b) /dev/urandom was used to seed. This seems wrong. Reading the code >shows no dev/urandom in the seeding parts. Actually, I am wrong here. dev/urandom is indeed used in 'rk_devfill', used in the seeding routine. It seem

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Gael Varoquaux
On Thu, Dec 11, 2008 at 10:20:48AM -0600, Bruce Southey wrote: > Part of this is one of the gotcha's of simulation that is not specific > to multiprocessing and Python. Just highly likely to occur in your case > with multiprocessing but does occur in single processing. As David > indicated, man

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Gael Varoquaux
On Thu, Dec 11, 2008 at 05:23:12PM +0100, Sturla Molden wrote: > On 12/11/2008 4:57 PM, David Cournapeau wrote: > > Why do you say the results are the same ? They don't look the same to > > me - only the first three are the same. > He used the multiprocessing.Pool object. There is a possible race

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Gael Varoquaux
On Fri, Dec 12, 2008 at 12:57:26AM +0900, David Cournapeau wrote: > > [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ > > 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, > > 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382, > > 0

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Sturla Molden
On 12/11/2008 4:57 PM, David Cournapeau wrote: > Why do you say the results are the same ? They don't look the same to > me - only the first three are the same. He used the multiprocessing.Pool object. There is a possible race condition here: one or more of the forked processes may be doing not

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Pauli Virtanen
Fri, 12 Dec 2008 00:57:26 +0900, David Cournapeau wrote: [clip] > On Fri, Dec 12, 2008 at 12:20 AM, Gael Varoquaux wrote: > [clip] >> Now I understand why this is the case: the different instances of the >> random number generator where created by forking from the same process, >> so they are exact

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread David Cournapeau
On Fri, Dec 12, 2008 at 12:57 AM, David Cournapeau <[EMAIL PROTECTED]> wrote: > Taking a look at the mtrand code in numpy, if the seed is not given, > it is taken from /dev/random if available, or the time clock if not; I > don't know what the semantics are for concurrent access to /dev/random > (

Re: [Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread David Cournapeau
On Fri, Dec 12, 2008 at 12:20 AM, Gael Varoquaux <[EMAIL PROTECTED]> wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accumulating fals

[Numpy-discussion] numpy.random and multiprocessing

2008-12-11 Thread Gael Varoquaux
Hi there, I have been using the multiprocessing module a lot to do statistical tests such as Monte Carlo or resampling, and I have just discovered something that makes me wonder if I haven't been accumulating false results. Given two files: === test.py === from test_helper import task from multip