[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Emmanuel Levy
Dear All, I have a large data frame ( 270 lines and 14 columns), and I would like to extract the information in a particular way illustrated below: Given a data frame "df": > col1=sample(c(0,1),10, rep=T) > names = factor(c(rep("A",5),rep("B",5))) > df = data.frame(names,col1) > df names

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Peter Cowan
Emmanuel, On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy <[EMAIL PROTECTED]> wrote: > Dear All, > > I have a large data frame ( 270 lines and 14 columns), and I would like to > extract the information in a particular way illustrated below: > > > Given a data frame "df": > >> col1=sample(c(0,1)

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Henrik Bengtsson
To simplify: n <- 2.7e6; x <- factor(c(rep("A", n/2), rep("B", n/2))); # Identify 'A':s t1 <- system.time(res <- which(x == "A")); # To compare a factor to a string, the factor is in practice # coerced to a character vector. t2 <- system.time(res <- which(as.character(x) == "A")); # Interesting

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
Dear Peter and Henrik, Thanks for your replies - this helps speed up a bit, but I thought there would be something much faster. What I mean is that I thought that a particular value of a level could be accessed instantly, similarly to a "hash" key. Since I've got about 6000 levels in that data f

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Erik Iverson
I still don't understand what you are doing. Can you make a small example that shows what you have and what you want? Is ?split what you are after? Emmanuel Levy wrote: Dear Peter and Henrik, Thanks for your replies - this helps speed up a bit, but I thought there would be something much fas

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
Sorry for being unclear, I thought the example above was clear enough. I have a data frame of the form: name info 1 YAL001C 1 2 YAL001C 1 3 YAL001C 1 4 YAL001C 1 5 YAL001C 0 6 YAL001C 1 7 YAL001C 1 8 YAL001C 1 9 YAL001C 1 10 YAL001C 1 .

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
Wow great! Split was exactly what was needed. It takes about 1 second for the whole operation :D Thanks again - I can't believe I never used this function in the past. All the best, Emmanuel 2008/8/13 Erik Iverson <[EMAIL PROTECTED]>: > I still don't understand what you are doing. Can you mak

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread jim holtman
split if probably what you are after. Here is an example: > n <- 270 > x <- data.frame(name=sample(1:6000,n,TRUE), value=runif(n)) > # split it into 6000 lists > system.time(y <- split(x$value, x$name)) user system elapsed 0.800.201.07 > str(y[1:10]) List of 10 $ 1 : num [1:45

Re: [R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread jim holtman
If you want the index, then use: > system.time(y <- split(seq(nrow(x)), x$name)) user system elapsed 0.810.060.88 > str(y[1:10]) List of 10 $ 1 : int [1:454] 6924 17503 26880 39197 42881 50835 57896 62624 65767 75359 ... $ 2 : int [1:440] 9954 25619 25761 33776 56651 60372 61042 6