[R] group bunch of lines in a data.frame, an additional requirement

2006-09-13 Thread Emmanuel Levy
Thanks for pointing me out "aggregate", that works fine! There is one complication though: I have mixed types (numerical and character), So the matrix is of the form: A 1.0 200 ID1 A 3.0 800 ID1 A 2.0 200 ID1 B 0.5 20 ID2 B 0.9 50 ID2 C 5.0 70 ID1 One letter always has the same ID but one

Re: [R] group bunch of lines in a data.frame, an additional requirement

2006-09-13 Thread Marc Schwartz (via MN)
Try something like this: # Initial data frame > DF V1 V2 V3 V4 1 A 1.0 200 ID1 2 A 3.0 800 ID1 3 A 2.0 200 ID1 4 B 0.5 20 ID2 5 B 0.9 50 ID2 6 C 5.0 70 ID1 # Now do the aggregation to get the means DF.1 <- aggregate(DF[, 2:3], list(V1 = DF$V1), mean) > DF.1 V1 V2 V3 1 A 2.0

Re: [R] group bunch of lines in a data.frame, an additional requirement

2006-09-13 Thread Gabor Grothendieck
See below. On 9/13/06, Emmanuel Levy <[EMAIL PROTECTED]> wrote: > Thanks for pointing me out "aggregate", that works fine! > > There is one complication though: I have mixed types (numerical and > character), > > So the matrix is of the form: > > A 1.0 200 ID1 > A 3.0 800 ID1 > A 2.0 200 ID1 > B

Re: [R] group bunch of lines in a data.frame, an additional requirement

2006-09-14 Thread Emmanuel Levy
Thanks Gabor, that is much faster than using a loop! I've got a last question: Can you think of a fast way of keeping track of the number of observations collapsed for each entry? i.e. I'd like to end up with: A 2.0 400 ID1 3 (3obs in the first matrix) B 0.7 35 ID2 2 (2obs in the first matrix)

Re: [R] group bunch of lines in a data.frame, an additional requirement

2006-09-14 Thread Marc Schwartz
Emmanuel, I wouldn't be surprised if Gabor comes up with something, but since aggregate() can only return scalars, you can't do it in one step here. There are possibilities using other functions such as split(), tapply() or by(), but each has it own respective limitations requiring more than one s

Re: [R] group bunch of lines in a data.frame, an additional requirement

2006-09-14 Thread Gabor Grothendieck
Here are three different ways to do it: # base R fb <- function(x) c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2), V3.mean = mean(x$V3), n = length(x$V1)) do.call(rbind, by(DF, DF[c(1,4)], fb)) # package doBy library(doBy) summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]

Re: [R] group bunch of lines in a data.frame, an additional requirement

2006-09-15 Thread Emmanuel Levy
(re)-Hello I actually thought about another possibility with a "1" column, a sum (instead of a mean), and a division of the columns for which I want the mean: > DF = data.frame( V1=c("A","A","A","B","B","C") , V2=c(1,3,2,0.5,0.9,5.0), > V3=c(200,800,200,20,50,70), V4=c("ID1","ID1","ID1","ID2","ID

Re: [R] group bunch of lines in a data.frame, an additional requirement

2006-09-15 Thread Gabor Grothendieck
Good idea. You could write it compactly like this: > transform(aggregate(cbind(DF[2:3], o = 1), DF[c(1,4)], sum, na.rm = TRUE), + V2 = V2/o, V3 = V3/o) V1 V4 V2 V3 o 1 A ID1 2.0 400 3 2 B ID2 0.7 35 2 3 C ID3 5.0 70 1 On 9/15/06, Emmanuel Levy <[EMAIL PROTECTED]> wrote: > (re)-Hello