Hello Anna,
The speed of parallel computing depends on many factors. To avoid any
potential confounders, Please try to use this code for timing (assuming you
still have all the variables you used in your example)
```
parallel_param <- SnowParam(workers = ncores, type = "SOCK", tasks =
My motivation for using distributed memory was that my package is also
accessible on Windows. Is it better to use shared memory as default but
check the user's system and then switch to socket only if necessary?
Regarding the real data. I have 68 samples (rows) of methylation EPIC array
data
Dear Anna,
According to the documentation of "BiocParallelParam", SnowParam() is a
subclass suitable for distributed memory (e.g. cluster) computing. If you're
running your code on a simpler machine with shared memory (e.g. your PC),
you're probably better off using MulticoreParam() instead.
Hi all!
I'm switching from the base R *parallel* package to *BiocParallel* for my
Bioconductor submission and I have two questions. First, I wanted advice on
whether I've implemented load balancing correctly. Second, I've noticed
that the running time is about 15% longer with BiocParallel. Any
nice day
>
>
>
> Giulia
>
>
>
> From: Martin Morgan
> Date: Thursday, July 7, 2022 at 14:28
> To: Giulia Pais , Henrik Bengtsson
>
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] BiocParallel and Shiny
>
> I think it should be straight-forw
, sorry.
Thank you
From: Vincent Carey
Date: Thursday, July 7, 2022 at 11:40
To: Giulia Pais
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] BiocParallel and Shiny
Interesting question. Have you looked at
https://shiny.rstudio.com/articles/progress.html ...? There is
also a file called
Hello,
I have a question on the use of BiocParallel with Shiny: I would like to show a
progress bar on the UI much like the standard progress bar that can be set in
functions like bplapply() � is it possible to do it and how? I haven�t found
anything on the topic in the documentation
Good day,
I am not sure how to fix my package properly, even with the good example. A
link to the specific part of my function is
https://github.com/DarioS/ClassifyR/blob/e35899caceb401691990136387a517f4c3b57d5e/R/runTests.R#L567
and the example in the help page of runTestsEasyHard function
The question is a bit abstract for me to understand and it might be better to
point to actual code in a git repository or similar...
Inside a package, something like
fun = function(x, y, ...) {
c(x, y, length(as.list(...)))
}
user_visible <- function(x, ...) {
y = 1
Good day,
Thanks for the examples which demonstrate the issue. Do you have other
recommendations if, inside the loop, another function in the package is being
called and the variable being passed is the ellipsis? There are only a couple
of variables which might be provided by the user
Windows uses separate processes that do not share memory (SnowParam()), whereas
linux / mac by default use forked processes that share the original memory
(MulticoreParam()). So
> y = 1
> param = MulticoreParam()
> res = bplapply(1:2, function(x) y, BPPARAM=param)
works because the function
Good day,
I have a loop in a function of my R package which by default uses bpparam() to
set the framework used for parallelisation. On Windows, I see the error
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: object 'selParams' not found
This error does not
yes it would be useful to post this to R-devel as a 'using
parallel::makeCluster() question, removing BiocParallel from the
equation, where some general insight might be had...
Martin
On 06/13/2018 05:00 PM, Dario Strbenac wrote:
Good day,
I couldn't get a working param object. It never
Good day,
I couldn't get a working param object. It never completes the command
param = bpstart(SnowParam(2, manager.hostname = "144.130.152.1", manager.port =
2559))
I obtained the IP address by typing "My IP address" into Google and it gave me
the address shown. I used netstat -an and
It's more likely that it never starts, probably because it tries to
create socket connections on ports that are not available, or perhaps
because the file path to the installed location of the BiocParallel
package is on a network share, or the 'master' node needs to be
specified with an IP
Good day,
I was interested how the performance of my package is on a 32-bit Windows
computer because I'm going to give a workshop about it soon and some people
might bring old laptops. I found that using SnowParam with workers set to more
than 1 never finishes. The minimal code to cause the
On 01/31/2018 06:39 PM, Ludwig Geistlinger wrote:
Hi,
I am currently considering the following snippet:
data.ids <- paste0("d", 1:5)
f <- function(x) paste("dataset", x, sep=" = ")
res <- BiocParallel::bplapply(data.ids, function(d) f(d))
Using a recent R-devel on both a Linux machine
Hi,
I am currently considering the following snippet:
> data.ids <- paste0("d", 1:5)
> f <- function(x) paste("dataset", x, sep=" = ")
> res <- BiocParallel::bplapply(data.ids, function(d) f(d))
Using a recent R-devel on both a Linux machine and a Mac machine, this works
fine.
However, on
ary 19, 2018 4:10 PM
To: Ludwig Geistlinger; Gabe Becker; Vincent Carey
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image
is malformed
On 01/19/2018 02:24 PM, Ludwig Geistlinger wrote:
> I apologize if I haven't been specific enough - h
Public Health
From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Martin Morgan
<martin.mor...@roswellpark.org>
Sent: Friday, January 19, 2018 1:54 PM
To: Gabe Becker; Vincent Carey
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-
.@roswellpark.org>
Sent: Friday, January 19, 2018 1:54 PM
To: Gabe Becker; Vincent Carey
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image
is malformed
On 01/19/2018 12:37 PM, Gabe Becker wrote:
> IT seems like you could also force a copy
On 01/19/2018 12:23 PM, Vincent Carey wrote:
good question
some of the discussion on
http://sqlite.1065341.n5.nabble.com/Parallel-access-to-read-only-in-memory-database-td91814.html
seems relevant.
converting the relatively small annotation package content to pure R
read-only tables on the
IT seems like you could also force a copy of the reference object via
$copy() and then force a refresh of the conn slot by assigning a
new db connection into it.
I'm having trouble confirming that this would work, however, because I
actually can't reproduce the error. The naive way works for me
good question
some of the discussion on
http://sqlite.1065341.n5.nabble.com/Parallel-access-to-read-only-in-memory-database-td91814.html
seems relevant.
converting the relatively small annotation package content to pure R
read-only tables on the master before parallelizing
might be very
Hi,
Within a package I am developing, I would like to enable parallel probe to gene
mapping for a compendium of microarray datasets.
This accordingly makes use of annotation packages such as hgu133a.db, which in
turn connect to the SQLite database via AnnotationDbi.
When running in multi-core
On 12/30/2017 04:08 PM, Ludwig Geistlinger wrote:
Hi,
I'm currently playing around with progress bars in BiocParallel - which is a
great package! ;-)
For demonstration, I'm using the example code from DESeq2::DESeq.
library(DESeq2)
library(BiocParallel)
f <- function(mu)
{
cnts <-
Hi,
I'm currently playing around with progress bars in BiocParallel - which is a
great package! ;-)
For demonstration, I'm using the example code from DESeq2::DESeq.
library(DESeq2)
library(BiocParallel)
f <- function(mu)
{
cnts <- matrix(rnbinom(n=1000, mu=mu, size=1/0.5), ncol=10)
from example(bplapply), after register(MulticoreParam(2, timeout=5))
bplppl> bplapply(1:10, fun)
*Error in socketConnection(host, port, TRUE, TRUE, "a+b", timeout =
timeout) : *
* cannot open the connection*
*In addition: Warning message:*
*In socketConnection(host, port, TRUE, TRUE, "a+b",
rote:
Hi Robert,
Thanks for reporting the bug. The problem was with how 'X' was split
before dispatching to bplapply() and affected both SerialParam and
SnowParam. Now fixed in release (1.2.21) and devel (1.3.52).
Valerie
- Forwarded Message -
From: "Robert Castelo"<robert.cast...@up
).
Valerie
- Forwarded Message -
From: "Robert Castelo" <robert.cast...@upf.edu>
To: bioc-devel@r-project.org
Sent: Wednesday, September 2, 2015 8:12:33 AM
Subject: [Bioc-devel] BiocParallel::bpvec() and DNAStringSet objects, problem
hi,
I have encountered a problem when
" <robert.cast...@upf.edu>
> To: bioc-devel@r-project.org
> Sent: Wednesday, September 2, 2015 8:12:33 AM
> Subject: [Bioc-devel] BiocParallel::bpvec() and DNAStringSet objects, problem
>
> hi,
>
> I have encountered a problem when using the bpvec() function from the
&g
hi,
I have encountered a problem when using the bpvec() function from the
BiocParallel package with DNAStringSet objects and the "SerialParam"
backend:
library(Biostrings)
library(BiocParallel)
## all correct when using the multicore backend
bpvec(X=DNAStringSet(c("AC", "GT")),
Hi Valerie,
Excellent. In addition to collecting log outputs, I have a few more
suggestions that may be worth considering:
- Collecting the results form parallel computing tasks directly in an R
object is a great convenience, which I like a lot. However, in the
context of slow computations
On Thu, Nov 20, 2014 at 12:17 PM, Thomas Girke thomas.gi...@ucr.edu wrote:
Hi Valerie,
Excellent. In addition to collecting log outputs, I have a few more
suggestions that may be worth considering:
- Collecting the results form parallel computing tasks directly in an R
object is a great
Hi Valerie, Michel and others,
Finally, I freed up some time to revisit this problem. As it turns out,
it is related to the use of a module system on our cluster. If I add in
the template file for Torque (torque.tmpl) an explicit module load line
for the specific R version, I am using on the
Hi Michel,
In BiocParallel 0.99.24 .convertToSimpleError() now checks for NULL and
converts to NA_character_.
I'm testing with BatchJobs 1.4, BiocParallel 0.99.24 and SLURM. I'm
still not getting an informative error message:
xx - bplapply(1:2, FUN)
SubmitJobs
This was a bug in BatchJobs::waitForJobs(). We now throw an error if
jobs disappear due to a faulty template file. I'd appreciate if you
could confirm that this is now correctly catched and handled on your
system. I furthermore suggest to replace NULL with NA_character_ in
.convertToSimpleError().
Hi,
Martin and I looked into this a bit. It looks like a problem with
handling an 'undefined error' returned from a worker (i.e., job did not
run). When there is a problem executing the tmpl script no error message
is sent back. The NULL is coerced to simpleError and becomes a problem
Hi Thomas,
Just wanted to let you know I saw this and am looking into it.
Valerie
On 09/20/2014 02:54 PM, Thomas Girke wrote:
Hi Martin, Micheal and Vincent,
If I run the following code, with the release version of BiocParallel then it
works (took me some time to actually realize that), but
Hi guys,
We often need to iterate over the cartesian product of two dimensions, like
sample X chromosome. This is preferable to nested iteration, which is
complicated. I've been using expand.grid and bpmapply for this, but it
seems like this could be made easier. Like bpmapply could gain a
Streamer package has DAGTeam/DAGParam components that I believe are
relevant.
An abstraction of the reduction plan for a parallelized task would seem to
have a natural
home in BatchJobs.
On Thu, Nov 14, 2013 at 8:15 AM, Michael Lawrence lawrence.mich...@gene.com
wrote:
Hi guys,
We often
We use a design iterator in BatchExperiments::makeDesign for a cartesian
product. I found a old version of designIterator (cf.
https://github.com/tudo-r/BatchExperiments/blob/master/R/designs.R) w/o
the optional data.frame input which is easier to read:
https://gist.github.com/mllg/7469844.
I like the general idea of having iterators; was just checking out the
itertools package after not having looked at it for a while. I could see
having a BiocIterators package, and a bpiterate(iterator, FUN, ...,
BPPARAM). My suggestion was simpler though. Right now, bpmapply runs a
single job per
On 11/04/2013 11:34 AM, Michael Lawrence wrote:
The dynamic nature of R limits the extent of these checks. But as Ryan has
noted, a simple sanity check goes a long way. If what he has done could be
extended to the rest of the search path (people always forget to attach
packages), I think we've
The 'foreach' framework does this sort of analysis using codetools at
least in part. You may be able to build on what they have.
luke
On Mon, 4 Nov 2013, Ryan wrote:
On 11/4/13, 11:05 AM, Gabriel Becker wrote:
As a side note, I'm not sure that existence of a symbol is sufficient (it
Actually, the check that I proposed is only supposed to check for usage
of user-defined variables, not variables from packages. Truthfully,
though, I guess I'm not the right person to work on this, since in
practice I use forked processes for the vast majority of my inside-R
parallelization,
Weird, I guess it needs to be logged in or something. I don't know if the
issue is that its in a non-master branch or waht. The repo is fully public
and the forCRAN_0.3.5 in branch definitely exists on github.
I started chrome (where I'm not logged into github) and got the same 404
error but
The code that I wrote intentionally avoids checking for package variables,
since I consider that a separate problem. Package variables can be provided
to the child by leading the package, whereas user-defined variables must be
serialized in the parent and sent to the child.
I think I could fairly
Ryan,
I agree that in some sense it is a different problem, but my point is with
a different approach we can easily answer both. The code I posted returns a
named character vector of symbol names with package name being the name.
This makes it a trivial lookup to determine both a) what symbols
On 11/4/13, 11:05 AM, Gabriel Becker wrote:
As a side note, I'm not sure that existence of a symbol is sufficient (it
certainly is necessary). What about situations where the symbol exists but
is stale compared to the value in the parent? Are we sure that can never
happen?
I think this is a
Hi,
in BiocParallel, is there a suggested (or planned) best standards for
making *locally* assigned variables (e.g. functions) available to the
applied function when it runs in a separate R process (which will be
the most common use case)? I understand that avoid local variables
should be
An analog to clusterExport is a good idea. To make it even easier, we could
have a dynamic environment based on object tables that would catch missing
symbols and download them from the parent thread. But maybe there's some
benefit to being explicit?
Michael
On Sun, Nov 3, 2013 at 12:39 PM,
On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
lawrence.mich...@gene.com wrote:
An analog to clusterExport is a good idea. To make it even easier, we could
have a dynamic environment based on object tables that would catch missing
symbols and download them from the parent thread. But maybe
Here's an easy thing we can add to BiocParallel in the short term. The
following code defines a wrapper function withBPExtraErrorText that
simply appends an additional message to the end of any error that looks
like it is about a missing variable. We could wrap every evaluation in
a similar
Another potential easy step we can do is that if FUN function in the
user's workspace, we automatically export that function under the same
name in the children. This would make recursive functions just work, but
it might be a bit too magical.
On 11/3/13, 2:38 PM, Ryan wrote:
Here's an easy
Henrik,
See https://github.com/duncantl/CodeDepends (as used by used by
https://github.com/gmbecker/RCacheSuite). It will identify necessarily
defined symbols (input variables) for code that is not doing certain tricks
(eg get(), mixing data.frame columns and gobal variables in formulas, etc ).
I guess all we need to do is to detect whether a function would try to
access a free variable in the user's workspace, and warn/error if so.
It looks like CodeDepends could do that. I could try to come up with an
implementation. I guess we would add CodeDepends as an optional
dependency for
Ryan (et al),
FYI:
f
function() {
x = rnorm(x)
x
}
findGlobals(f)
[1] = { rnorm
x should be in the list of globals but it isn't.
~G
sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3]
Ok, here is my attempt at a function to get the list of user-defined
free variables that a function refers to:
https://gist.github.com/DarwinAwardWinner/7298557
Is uses codetools, so it is subject to the limitations of that package,
but for simple examples, it successfully detects when a
On 09/03/2013 05:25 AM, Hahne, Florian wrote:
Hi List, Martin,
I just wanted to quickly ask about the status of the BiocParallel package and
the cluster support in particular. Is this project finished? And are there plans
to having BiocParallel as a proper package again, or will it remain a GIT
Hi Florian,
Yes you're absolutely right. The fork currently depends on some
functions which are not yet included in the CRAN build. For now you
can get the latest development version on
http://batchjobs.googlecode.com. We'll upload a new version of
BatchJobs soon. I've documented this as an issue
On Thu, Jun 6, 2013 at 1:56 PM, Henrik Bengtsson h...@biostat.ucsf.edu wrote:
Hi, I'd like to pick up the discussion on a BatchJobs backend for
BiocParallel where it was left back in Dec 2012 (Bioc-devel thread
'BiocParallel'
And here is the on-going development of the backend:
https://github.com/mllg/BiocParallel/tree/batchjobs
Not sure how well it's been tested.
Kudos to Michel Lang for making so much progress so quickly.
Michael
On Thu, Jun 6, 2013 at 1:59 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:
On Thu,
On Tue 04 Dec 2012 11:31:59 AM PST, Michael Lawrence wrote:
The name pvec is not very intuitive. What about bpchunk? And since the
function passed to bpvectorize is already vectorized, maybe bpvectorize
should be bparallelize? I know everyone has different
intuitions/preferences when it comes to
In reply to:
On 11/16/2012 09:45 PM, Steve Lianoglou wrote:
But then you have the situation of multi-machines w/ multiple cores --
is this (2) or (3) here? How do you explicitly write code for that w/
foreach mojo? I guess the answer to that is that you let your grid
engine (or whatever your
This sounds very useful when mixing batch jobs with an interactive session.
In fact, it's something I was planning to do, since I noticed their
execution model is completely asynchronous. Is it actually a new cluster
backend for the parallel package?
Michael
On Fri, Nov 16, 2012 at 12:18 AM,
I'm not sure I understand the appeal of foreach. Why not do this within the
functional paradigm, i.e, parLapply?
Michael
On Fri, Nov 16, 2012 at 9:41 AM, Ryan C. Thompson r...@thompsonclan.orgwrote:
You could write a %dopar% backend for the foreach package, which would
allow any code using
To be more specific, instead of:
library(parallel)
cl - ... # Make a cluster
parLapply(cl, X, fun, ...)
you can do:
library(parallel)
library(doParallel)
library(plyr)
cl - ...
registerDoParallel(cl)
llply(X, fun, ..., .parallel=TRUE)
On Fri 16 Nov 2012 11:44:06 AM PST, Ryan C. Thompson
On Fri, Nov 16, 2012 at 11:44 AM, Ryan C. Thompson r...@thompsonclan.orgwrote:
You don't have to use foreach directly. I use foreach almost exclusively
through the plyr package, which uses foreach internally to implement
parallelism. Like you, I'm not particularly fond of the foreach syntax
On 11/15/2012 6:21 AM, Kasper Daniel Hansen wrote:
I'll second Ryan's patch (at least in principle). When I parallelize
across multiple cores, I have always found mc.preschedule to be an
important option to expose (that, and the number of cores, is all I
use routinely).
Yes, Ryan provided a
Personally, having used memcached in the past for distributed shared memory
caching, I am most interested in 3) and doRedis. Many cluster/batch
processing systems are a colossal PITA, and a worker queue would go a long
way towards fixing that. Less checkpointing, more results... I hope.
As an
should approaches to fault-tolerance/recovery/debugging be a topic here?
On Thu, Nov 15, 2012 at 1:53 PM, Henrik Bengtsson h...@biostat.ucsf.eduwrote:
Is there any write up/discussion/plans on the various types of
parallel computations out there:
(1) one machine / multi-core/multi-threaded
On Thu, Nov 15, 2012 at 11:00 AM, Martin Morgan mtmor...@fhcrc.org wrote:
On 11/15/2012 10:53 AM, Henrik Bengtsson wrote:
Is there any write up/discussion/plans on the various types of
parallel computations out there:
(1) one machine / multi-core/multi-threaded
(2) multiple machines /
On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan mtmor...@fhcrc.org wrote:
Interested developers -- I added the start of a BiocParallel package to
the Bioconductor subversion repository and build system.
The package is mirrored on github to allow for social coding; I encourage
people to
On 11/14/2012 03:43 PM, Ryan C. Thompson wrote:
Here are two alternative implementations of pvec. pvec2 is just a simple rewrite
of pvec to use mclapply. pvec3 then extends pvec2 to accept a specified chunk
size or a specified number of chunks. If the number of chunks exceeds the number
of
I just submitted a pull request. I'll add tests shortly if I can figure
out how to write them.
On Wed 14 Nov 2012 03:50:36 PM PST, Martin Morgan wrote:
On 11/14/2012 03:43 PM, Ryan C. Thompson wrote:
Here are two alternative implementations of pvec. pvec2 is just a
simple rewrite
of pvec to
76 matches
Mail list logo