If you want the index, then use:
> system.time(y <- split(seq(nrow(x)), x$name))
user system elapsed
0.810.060.88
> str(y[1:10])
List of 10
$ 1 : int [1:454] 6924 17503 26880 39197 42881 50835 57896 62624
65767 75359 ...
$ 2 : int [1:440] 9954 25619 25761 33776 56651 60372 61042 6
split if probably what you are after. Here is an example:
> n <- 270
> x <- data.frame(name=sample(1:6000,n,TRUE), value=runif(n))
> # split it into 6000 lists
> system.time(y <- split(x$value, x$name))
user system elapsed
0.800.201.07
> str(y[1:10])
List of 10
$ 1 : num [1:45
Wow great! Split was exactly what was needed. It takes about 1 second
for the whole operation :D
Thanks again - I can't believe I never used this function in the past.
All the best,
Emmanuel
2008/8/13 Erik Iverson <[EMAIL PROTECTED]>:
> I still don't understand what you are doing. Can you mak
Sorry for being unclear, I thought the example above was clear enough.
I have a data frame of the form:
name info
1 YAL001C 1
2 YAL001C 1
3 YAL001C 1
4 YAL001C 1
5 YAL001C 0
6 YAL001C 1
7 YAL001C 1
8 YAL001C 1
9 YAL001C 1
10 YAL001C 1
.
I still don't understand what you are doing. Can you make a small
example that shows what you have and what you want?
Is ?split what you are after?
Emmanuel Levy wrote:
Dear Peter and Henrik,
Thanks for your replies - this helps speed up a bit, but I thought
there would be something much fas
Dear Peter and Henrik,
Thanks for your replies - this helps speed up a bit, but I thought
there would be something much faster.
What I mean is that I thought that a particular value of a level
could be accessed instantly, similarly to a "hash" key.
Since I've got about 6000 levels in that data f
To simplify:
n <- 2.7e6;
x <- factor(c(rep("A", n/2), rep("B", n/2)));
# Identify 'A':s
t1 <- system.time(res <- which(x == "A"));
# To compare a factor to a string, the factor is in practice
# coerced to a character vector.
t2 <- system.time(res <- which(as.character(x) == "A"));
# Interesting
Emmanuel,
On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy <[EMAIL PROTECTED]> wrote:
> Dear All,
>
> I have a large data frame ( 270 lines and 14 columns), and I would like to
> extract the information in a particular way illustrated below:
>
>
> Given a data frame "df":
>
>> col1=sample(c(0,1)
Dear All,
I have a large data frame ( 270 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:
Given a data frame "df":
> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df
names
9 matches
Mail list logo