On Thu, Dec 11, 2008 at 8:20 AM, Gael Varoquaux <
gael.varoqu...@normalesup.org> wrote:
> Hi there,
>
> I have been using the multiprocessing module a lot to do statistical tests
> such as Monte Carlo or resampling, and I have just discovered something
> that makes me wonder if I haven't been accu
On Thu, Dec 11, 2008 at 13:06, Sturla Molden wrote:
>
>> Create RandomState objects and use those. This is a best practice
>> whether you are using multiprocessing or not. The module-level
>> functions really should only be used for noodling around in IPython.
>
> Are we guaranteed that two Random
> Create RandomState objects and use those. This is a best practice
> whether you are using multiprocessing or not. The module-level
> functions really should only be used for noodling around in IPython.
Are we guaranteed that two RandomStates will produce two independent
sequences? If not, Rando
In the docs I found this:
"We used a hypothesis that a set of PRNGs based on linear recurrences is
mutually 'independent' if the characteristic polynomials are relatively
prime to each other. There is no rigorous proof of this hypothesis..."
S.M.
> Here is the c program and the description
On Thu, Dec 11, 2008 at 07:20, Gael Varoquaux
wrote:
> The take home message is:
> **call 'numpy.random.seed()' when you are using multiprocessing**
Create RandomState objects and use those. This is a best practice
whether you are using multiprocessing or not. The module-level
functions really s
Here is the c program and the description how to implement independent
Mersenne Twister PRNGs by the inventor(s) of Mersenne Twister:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html
I didn't see a license statement.
Josef
___
Numpy-discussio
I'd just like to add that yet another option would be to use the
manager/proxy object in multiprocessing. In this case
numpy.random.random will be called in the parent process. I have not
used this and I am not sure how efficient it is. But the possibility is
there.
Sturla Molden
=== tes
>
>> Is the goal to parallelize a big sampler into N tasks of M trials, to
>> produce the same result as a sequential set of M*N trials ? Then it does
>> sound like a trivial task at all. I know there exists libraries
>> explicitly designed for parallel random number generation - maybe this
>> is w
On Fri, Dec 12, 2008 at 2:49 AM, Gael Varoquaux
wrote:
> On Fri, Dec 12, 2008 at 02:29:55AM +0900, David Cournapeau wrote:
>> The seed could be explicitly set in each task, no ?
>
>> def task(x):
>> np.random.seed()
>> return np.random.random(x)
>
> Yes. The problem is trivial to solve, on
On Fri, Dec 12, 2008 at 3:00 AM, Bruce Southey wrote:
> David Cournapeau wrote:
>> Sturla Molden wrote:
>>
>>> On 12/11/2008 6:10 PM, Michael Gilbert wrote:
>>>
>>>
>>>
Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
this kind of error? A simple enough solution would
On 12/11/2008 6:29 PM, David Cournapeau wrote:
> def task(x):
> np.random.seed()
> return np.random.random(x)
>
> But does this really make sense ?
Hard to say... There is a chance of this producing indentical or
overlapping sequences, albeit unlikely. I would not do this. I'd make
one
David Cournapeau wrote:
> Sturla Molden wrote:
>
>> On 12/11/2008 6:10 PM, Michael Gilbert wrote:
>>
>>
>>
>>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
>>> this kind of error? A simple enough solution would be to also include
>>> the process id as part of the
On Fri, Dec 12, 2008 at 02:29:55AM +0900, David Cournapeau wrote:
> The seed could be explicitly set in each task, no ?
> def task(x):
> np.random.seed()
> return np.random.random(x)
Yes. The problem is trivial to solve, once you are aware of it. Just like
the integer division problems we
Sturla Molden wrote:
> On 12/11/2008 6:10 PM, Michael Gilbert wrote:
>
>
>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
>> this kind of error? A simple enough solution would be to also include
>> the process id as part of the seed
>>
>
> It would not help, as the s
On 12/11/2008 6:21 PM, Sturla Molden wrote:
> It would not help, as the seeding is done prior to forking.
>
> I am mostly familiar with Windows programming. But what is needed is a
> fork handler (similar to a system hook in Windows jargon) that sets a
> new seed in the child process.
Actually
On 12/11/2008 6:10 PM, Michael Gilbert wrote:
> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
> this kind of error? A simple enough solution would be to also include
> the process id as part of the seed
It would not help, as the seeding is done prior to forking.
I am most
Michael Gilbert wrote:
>> Exactly, change task_helper.py to
>>
>>
>> import numpy as np
>>
>> def task(x):
>> import os
>> print "Hi, I'm", os.getpid()
>> return np.random.random(x)
>>
>>
>> and note the output
>>
>>
>> Hi, I'm 16197
>> Hi, I'm 16198
>> Hi, I'm 16199
>> H
> Exactly, change task_helper.py to
>
>
> import numpy as np
>
> def task(x):
> import os
> print "Hi, I'm", os.getpid()
> return np.random.random(x)
>
>
> and note the output
>
>
> Hi, I'm 16197
> Hi, I'm 16198
> Hi, I'm 16199
> Hi, I'm 16199
> [ 0.58175647 0.16293922
Thu, 11 Dec 2008 17:55:58 +0100, Sturla Molden wrote:
[clip]
> Sure, a pool is fine. I was just speculating that one of the four
> processes in your pool was idle all the time; i.e. that one of the other
> three got to do the task twice. Therefore you only got three identical
> results and not four
On Thu, Dec 11, 2008 at 05:55:58PM +0100, Sturla Molden wrote:
> > No, Pool is what I want, because in my production code I am submitting
> > jobs to that pool.
> Sure, a pool is fine. I was just speculating that one of the four
> processes in your pool was idle all the time; i.e. that one of the
On 12/11/2008 5:39 PM, Gael Varoquaux wrote:
>>> Why do you say the results are the same ? They don't look the same to
>>> me - only the first three are the same.
>
>> He used the multiprocessing.Pool object. There is a possible race
>> condition here: one or more of the forked processes may be
Gael Varoquaux wrote:
> Hi there,
>
> I have been using the multiprocessing module a lot to do statistical tests
> such as Monte Carlo or resampling, and I have just discovered something
> that makes me wonder if I haven't been accumulating false results. Given
> two files:
>
> === test.py ===
> fr
On Thu, Dec 11, 2008 at 05:36:47PM +0100, Gael Varoquaux wrote:
> b) /dev/urandom was used to seed. This seems wrong. Reading the code
>shows no dev/urandom in the seeding parts.
Actually, I am wrong here. dev/urandom is indeed used in 'rk_devfill',
used in the seeding routine. It seem
On Thu, Dec 11, 2008 at 10:20:48AM -0600, Bruce Southey wrote:
> Part of this is one of the gotcha's of simulation that is not specific
> to multiprocessing and Python. Just highly likely to occur in your case
> with multiprocessing but does occur in single processing. As David
> indicated, man
On Thu, Dec 11, 2008 at 05:23:12PM +0100, Sturla Molden wrote:
> On 12/11/2008 4:57 PM, David Cournapeau wrote:
> > Why do you say the results are the same ? They don't look the same to
> > me - only the first three are the same.
> He used the multiprocessing.Pool object. There is a possible race
On Fri, Dec 12, 2008 at 12:57:26AM +0900, David Cournapeau wrote:
> > [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([
> > 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964,
> > 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382,
> > 0
On 12/11/2008 4:57 PM, David Cournapeau wrote:
> Why do you say the results are the same ? They don't look the same to
> me - only the first three are the same.
He used the multiprocessing.Pool object. There is a possible race
condition here: one or more of the forked processes may be doing
not
Fri, 12 Dec 2008 00:57:26 +0900, David Cournapeau wrote:
[clip]
> On Fri, Dec 12, 2008 at 12:20 AM, Gael Varoquaux wrote:
> [clip]
>> Now I understand why this is the case: the different instances of the
>> random number generator where created by forking from the same process,
>> so they are exact
On Fri, Dec 12, 2008 at 12:57 AM, David Cournapeau <[EMAIL PROTECTED]> wrote:
> Taking a look at the mtrand code in numpy, if the seed is not given,
> it is taken from /dev/random if available, or the time clock if not; I
> don't know what the semantics are for concurrent access to /dev/random
> (
On Fri, Dec 12, 2008 at 12:20 AM, Gael Varoquaux
<[EMAIL PROTECTED]> wrote:
> Hi there,
>
> I have been using the multiprocessing module a lot to do statistical tests
> such as Monte Carlo or resampling, and I have just discovered something
> that makes me wonder if I haven't been accumulating fals
Hi there,
I have been using the multiprocessing module a lot to do statistical tests
such as Monte Carlo or resampling, and I have just discovered something
that makes me wonder if I haven't been accumulating false results. Given
two files:
=== test.py ===
from test_helper import task
from multip
31 matches
Mail list logo