On Mon, Oct 10, 2011 at 4:14 PM, Joshua Wiley wrote:
> I could be waay off base here, but my concern about presplitting the data is
> that you will have your data, and a second copy of our data that is something
> like a list where each element contains the portion of the data for that
> split.
This is the sort of thing that should be measured, rather than
speculated about, but if you're using multicore all those subsets can
be made at the same time, not sequentially, so they add up to a copy
of the whole data. Using data.table rather than a data.frame would
help, of course.
I would gu
I could be waay off base here, but my concern about presplitting the data is
that you will have your data, and a second copy of our data that is something
like a list where each element contains the portion of the data for that split.
Good speed wise, bad memory wise. My hope with the techniqu
On Tue, Oct 11, 2011 at 7:54 AM, ivo welch wrote:
> hi josh---thx. I had a different version of this, and discarded it
> because I think it was very slow. the reason is that on each
> application, your version has to scan my (very long) data vector. (I
> have many thousand different cases, too.
hi josh---thx. I had a different version of this, and discarded it
because I think it was very slow. the reason is that on each
application, your version has to scan my (very long) data vector. (I
have many thousand different cases, too.) I presume that by() has one
scan through the vector that
Package plyr has .parallel.
Searching datatable-help for "multicore", say on Nabble here,
http://r.789695.n4.nabble.com/datatable-help-f2315188.html
yields three relevant posts and examples.
Please check wiki do's and don'ts to make sure you didn't
fall into one of those traps, though (we don't
Hi Ivo,
My suggestion would be to only pass lapply (or mclapply) the indices.
That should be fast, subsetting with data table should also be fast,
and then you do whatever computations you will. For example:
require(data.table)
DT <- data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
se
dear r experts---Is there a multicore equivalent of by(), just like
mclapply() is the multicore equivalent of lapply()?
if not, is there a fast way to convert a data.table into a list based
on a column that lapply and mclapply can consume?
advice appreciated...as always.
regards,
/iaw
Ivo
8 matches
Mail list logo