Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Hadley Wickham
On Mon, Oct 10, 2011 at 4:14 PM, Joshua Wiley wrote: > I could be waay off base here, but my concern about presplitting the data is > that you will have your data, and a second copy of our data that is something > like a list where each element contains the portion of the data for that > split.

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Thomas Lumley
This is the sort of thing that should be measured, rather than speculated about, but if you're using multicore all those subsets can be made at the same time, not sequentially, so they add up to a copy of the whole data. Using data.table rather than a data.frame would help, of course. I would gu

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Joshua Wiley
I could be waay off base here, but my concern about presplitting the data is that you will have your data, and a second copy of our data that is something like a list where each element contains the portion of the data for that split. Good speed wise, bad memory wise. My hope with the techniqu

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Thomas Lumley
On Tue, Oct 11, 2011 at 7:54 AM, ivo welch wrote: > hi josh---thx.  I had a different version of this, and discarded it > because I think it was very slow.  the reason is that on each > application, your version has to scan my (very long) data vector.  (I > have many thousand different cases, too.

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread ivo welch
hi josh---thx. I had a different version of this, and discarded it because I think it was very slow. the reason is that on each application, your version has to scan my (very long) data vector. (I have many thousand different cases, too.) I presume that by() has one scan through the vector that

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Matthew Dowle
Package plyr has .parallel. Searching datatable-help for "multicore", say on Nabble here, http://r.789695.n4.nabble.com/datatable-help-f2315188.html yields three relevant posts and examples. Please check wiki do's and don'ts to make sure you didn't fall into one of those traps, though (we don't

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Joshua Wiley
Hi Ivo, My suggestion would be to only pass lapply (or mclapply) the indices. That should be fast, subsetting with data table should also be fast, and then you do whatever computations you will. For example: require(data.table) DT <- data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9) se

[R] multicore by(), like mclapply?

2011-10-10 Thread ivo welch
dear r experts---Is there a multicore equivalent of by(), just like mclapply() is the multicore equivalent of lapply()? if not, is there a fast way to convert a data.table into a list based on a column that lapply and mclapply can consume? advice appreciated...as always. regards, /iaw Ivo