Re: [Bioc-devel] reproducible with mclapply?

2015-06-05 Thread Ramon Diaz-Uriarte
On Thu, 04-06-2015, at 15:50, Kasper Daniel Hansen wrote: > Note you're not guaranteed that two random streams starting with different > seeds will be (approximately) independent, so the suggestion on SO makes > the numbers reproducible but technically wrong. > But whether or not that is a pr

Re: [Bioc-devel] reproducible with mclapply?

2015-06-04 Thread Kasper Daniel Hansen
You're ignoring the fact that some random number generators should never be used inside of mclapply(), period. You should add that to your post and you should show how to set the random number generator appropriately. You seem to be only focusing on reproducibility of the code and not correctness

Re: [Bioc-devel] reproducible with mclapply?

2015-06-04 Thread Vladislav Petyuk
The only bad thing I see so far in using set.seed inside the function is that it interferes with previously set seed by the user. So follow-up stochastic computation will be out user's control. Perhaps there are other undesirable effect that I do not see at this point. I tweaked the solution a bi

Re: [Bioc-devel] reproducible with mclapply?

2015-06-04 Thread Valerie Obenchain
I'll add a section to the BiocParallel docs. Valerie On 06/04/2015 07:55 AM, Kasper Daniel Hansen wrote: Yes, based on the documentation that particular random stream generator would work with mclapply. This is absolutely a subject which ought to be covered in the BiocParallel documentation.

Re: [Bioc-devel] reproducible with mclapply?

2015-06-04 Thread Kasper Daniel Hansen
Yes, based on the documentation that particular random stream generator would work with mclapply. This is absolutely a subject which ought to be covered in the BiocParallel documentation. And commenting on another set of recommendations: please NEVER used set.seed inside a function. Unfortunatel

Re: [Bioc-devel] reproducible with mclapply?

2015-06-04 Thread Vincent Carey
It does appear to me that the doRNG vignette sec 1.1 describes a solution to the problem posed. It is less clear to me that this method is readily adopted with BiocParallel unless registerDoPar is in use Should we address this topic explicitly in the vignette? On Thu, Jun 4, 2015 at 9:50 AM,

Re: [Bioc-devel] reproducible with mclapply?

2015-06-04 Thread Kasper Daniel Hansen
Note you're not guaranteed that two random streams starting with different seeds will be (approximately) independent, so the suggestion on SO makes the numbers reproducible but technically wrong. If you want true independence you either need to use a parallel version of the random number generator

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Vladislav Petyuk
There are different ways set.seed can be used. The way it is suggested on the aforementioned stackoverflow post is basically a two stage process. First seed is provided by a user (set.seed(1)). That is user can change the outcome from run to run. Based on that seed, a vector of randomized seeds

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Yu, Guangchuang
There is one possible solution posted in http://stackoverflow.com/questions/30610375/how-to-run-permutations-using-mclapply-in-a-reproducible-way-regardless-of-numbe/30627984#30627984 . As Kasper suggested, it's not a proper way to use set.seed inside a package. I suggest using a parameter for ex

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Kasper Daniel Hansen
For this situation, generate the permutation indexes outside of the mclapply, and the do mclapply over a list with the indices. And btw., please don't use set.seed inside a package; that control should completely be left to the user. Best, Kasper On Wed, Jun 3, 2015 at 7:08 AM, Vincent Carey wr

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Vincent Carey
This document indicates how to achieve reproducibility independent of the underlying physical environment. http://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf Let me know if that satisfies the question. On Wed, Jun 3, 2015 at 5:32 AM, Yu, Guangchuang wrote: > Der Vincent, > > RNGk

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Yu, Guangchuang
Der Vincent, RNGkind("L'Ecuyer-CMRG") works as using mc.set.seed=FALSE. When mc.cores changes, the output is not reproducible. I think this issue is also of concern within the Bioconductor community as parallel version of permutation test is commonly used now. Best Regards, Guangchuang On W

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Vincent Carey
Hi, this question belongs on R-help, but perhaps https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/RngStream.html will be useful. Best regards On Wed, Jun 3, 2015 at 3:11 AM, Yu, Guangchuang wrote: > Dear all, > > I have an issue of setting seed value when using parallel package. >

[Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Yu, Guangchuang
Dear all, I have an issue of setting seed value when using parallel package. > library("parallel") > library("digest") > > set.seed(0) > m <- mclapply(1:10, function(x) sample(1:10), + mc.cores=2) > digest(m, 'crc32') [1] "4827c80c" > > set.seed(0) > m <- mclapply(1:10, function(x)