[R] aggregate slow with many rows - alternative?

2005-10-13 Thread Hans-Peter
Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -- dat <- data.

Re: [R] aggregate slow with many rows - alternative?

2005-10-13 Thread Gabor Grothendieck
Convert dat to a matrix and see if working with the matrix instead of a data frame speeds things up enough. On 10/13/05, Hans-Peter <[EMAIL PROTECTED]> wrote: > Hi, > > I use the code below to aggregate / cnt my test data. It works fine, > but the problem is with my real data (33'000 rows) where t

Re: [R] aggregate slow with many rows - alternative?

2005-10-13 Thread Frank E Harrell Jr
Gabor Grothendieck wrote: > Convert dat to a matrix and see if working with the > matrix instead of a data frame speeds things up > enough. In the Hmisc package the asNumericMatrix and matrix2dataFrame functions facilite this. Also look at the summarize and mApply functions in Hmisc, which can b

Re: [R] aggregate slow with many rows - alternative?

2005-10-14 Thread TEMPL Matthias
Hi, Yesterday, I have analysed data with 16 rows and 10 columns. Aggregation would be impossible with a data frame format, but when converting it to a matrix with *numeric* entries (check, if the variables are of class numeric!) the computation needs only 7 seconds on a Pentium III. I´m sad

Re: [R] aggregate slow with many rows - alternative?

2005-10-14 Thread jim holtman
Here is the way that I would do it. Using 'lapply' to process the list and create a matrix; take less than 1 second: > dat <- data.frame(D=sample(32000:33000, 33000, T), + Fid=sample(1:10,33000,T), A=sample(1:5,33000,T)) > system.time({ + result <- lapply(split(seq(nrow(dat)), dat$D), function(.d)

Re: [R] aggregate slow with many rows - alternative?

2005-10-14 Thread Hans-Peter
Many thanks for all your answers. Converting to a matrix didn't help, I tried with Hmisc but didn't get anywhere (different summary functions, multiple levels). 2005/10/14, jim holtman <[EMAIL PROTECTED]>: > Here is the way that I would do it. Using 'lapply' to process the list and > create a ma