[Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Yu, Guangchuang
Dear all, I have an issue of setting seed value when using parallel package. > library("parallel") > library("digest") > > set.seed(0) > m <- mclapply(1:10, function(x) sample(1:10), + mc.cores=2) > digest(m, 'crc32') [1] "4827c80c" > > set.seed(0) > m <- mclapply(1:10, function(x)

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Vincent Carey
Hi, this question belongs on R-help, but perhaps https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/RngStream.html will be useful. Best regards On Wed, Jun 3, 2015 at 3:11 AM, Yu, Guangchuang wrote: > Dear all, > > I have an issue of setting seed value when using parallel package. >

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Yu, Guangchuang
Der Vincent, RNGkind("L'Ecuyer-CMRG") works as using mc.set.seed=FALSE. When mc.cores changes, the output is not reproducible. I think this issue is also of concern within the Bioconductor community as parallel version of permutation test is commonly used now. Best Regards, Guangchuang On W

[Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-03 Thread Ludwig Geistlinger
Dear Bioc annotation team, Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for BRCA1; ENSG0012048; entrez:672 via > genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id="672")) gives me: GRanges object with 1 range and 1 metadata column: seqnames

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Vincent Carey
This document indicates how to achieve reproducibility independent of the underlying physical environment. http://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf Let me know if that satisfies the question. On Wed, Jun 3, 2015 at 5:32 AM, Yu, Guangchuang wrote: > Der Vincent, > > RNGk

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Kasper Daniel Hansen
For this situation, generate the permutation indexes outside of the mclapply, and the do mclapply over a list with the indices. And btw., please don't use set.seed inside a package; that control should completely be left to the user. Best, Kasper On Wed, Jun 3, 2015 at 7:08 AM, Vincent Carey wr

Re: [Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency

2015-06-03 Thread Robert M. Flight
Ludwig, If you do this search on the UCSC genome browser (which this annotation package is built from), you will see that the longest variant is what is shown http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg38&position=brca1&hgt.positionInput=brca1&hgt.suggestTrack=knownGene&S

[Bioc-devel] caOmicsV package

2015-06-03 Thread Zhang, Hongen (NIH/NCI) [E]
Good morning, All. We recently implemented caOmicsV package which is available now on http://bioconductor.org/packages/3.2/bioc/html/caOmicsV.html. The package provides methods to visualize multidimensional cancer genomic data with two layouts: a matrix layout and a combined biological network a

[Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-03 Thread Tim Triche, Jr.
It would be nice (for a number of reasons) to have chromosome lengths readily available in a foundational package like GenomeInfoDb, so that, say, data(seqinfo.hg19) seqinfo(myResults) <- seqinfo.hg19[ seqlevels(myResults) ] would work without issues. Is there any particular reason this couldn't

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-03 Thread Kasper Daniel Hansen
Let me rephrase this slightly. From one POV the purpose of GenomeInfoDb is clean up the seqinfo slot. Currently it does most of the cleaning, but it does not add seqlengths. It is clear that seqlengths depends on the version of the genome, but I will argue so does the seqnames. Of course, for h

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-03 Thread Vincent Carey
I typically get this info from Homo.sapiens. The result is parasitic on the TxDb that is in there. I don't know how easy it is to swap alternate TxDb in to get a different build. I think it would make sense to regard the OrganismDb instances as foundational for this sort of structural data. On

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-03 Thread Tim Triche, Jr.
Right, I typically do that too, and if you're working on human data it isn't a big deal. What makes things a lot more of a drag is when you work on e.g. mouse data (mm9 vs mm10, aka GRCm37 vs GRCm38) where Mus.musculus is essentially a "build ahead" of Homo.sapiens. R> seqinfo(Homo.sapiens) Seqin

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-03 Thread Vincent Carey
It really isn't hard to have multiple OrganismDb packages in place -- the process of making new ones is documented and was given as an exercise in the EdX course. I don't know if we want to institutionalize it and distribute such -- I think we might, so that there would be Hs19, Hs38, mm9, etc. pa

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-03 Thread Tim Triche, Jr.
That would be perfect actually. And it would radically reduce & modularize maintenance. Maybe that's the best way to go after all. Quite sensible. --t > On Jun 3, 2015, at 12:46 PM, Vincent Carey wrote: > > It really isn't hard to have multiple OrganismDb packages in place -- the > process

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Yu, Guangchuang
There is one possible solution posted in http://stackoverflow.com/questions/30610375/how-to-run-permutations-using-mclapply-in-a-reproducible-way-regardless-of-numbe/30627984#30627984 . As Kasper suggested, it's not a proper way to use set.seed inside a package. I suggest using a parameter for ex

Re: [Bioc-devel] reproducible with mclapply?

2015-06-03 Thread Vladislav Petyuk
There are different ways set.seed can be used. The way it is suggested on the aforementioned stackoverflow post is basically a two stage process. First seed is provided by a user (set.seed(1)). That is user can change the outcome from run to run. Based on that seed, a vector of randomized seeds

Re: [Bioc-devel] Wishlist: on demand R CMD check

2015-06-03 Thread Steffen Neumann
Hi, On Di, 2015-06-02 at 13:39 +0100, Laurent Gatto wrote: > To what extend could the single package builder be used for such a > feature? This would not address Michael's point, but it is a way to get > access to all archs using existing software infrastructure. That would be great. I have been