RE: [R] row selection based on median in data frame

Nick.Ellis Thu, 01 Apr 2004 16:57:39 -0800

> tmp
  row.labels        a b  c 
1          1 deadlift 7 13
2          2    squat 7 24
3          3    clean 7 10
4          4 deadlift 8  8
5          5    squat 8 20
6          6    clean 8  2
7          7 deadlift 9  5
8          8    squat 9 32
9          9    clean 9 19
> tapply(tmp$c,tmp$a,median)
 clean deadlift squat 
    10        8    24
> tmp[tapply(1:nrow(tmp),tmp$a,function(i,x) {x <- x[i]; i[x==median(x)]}, x=tmp$c),]
  row.labels        a b  c 
3          3    clean 7 10
4          4 deadlift 8  8
2          2    squat 7 24


If you have multiple grouping variables g1,g2,g3 you simply include those in the 2nd 
argument:

tmp[tapply(1:nrow(tmp),tmp[c("gp1","gp2","gp3")],function(i,x) {x <- x[i]; 
i[x==median(x)]}, x=tmp$c),]

Nick Ellis
CSIRO Marine Research   mailto:[EMAIL PROTECTED]
PO Box 120                      ph    +61 (07) 3826 7260
Cleveland QLD 4163      fax   +61 (07) 3826 7222
Australia                       http://www.marine.csiro.au
  
> 
> 
> ------------------------------
> 
> Message: 75
> Date: Wed, 31 Mar 2004 22:22:22 -0500
> From: Ed L Cashin <[EMAIL PROTECTED]>
> Subject: [R] row selection based on median in data frame
> To: [EMAIL PROTECTED]
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi.  I am having trouble thinking of an easy way to grab rows out of a
> data frame.  I want to select the rows with a median value when the
> rows are similar.
> 
> A simple example is this table, which I could read into a data frame.
> I would like to find a new data frame with only the rows with a median
> value for the "c" column given a certain "a" value.
> 
> For example, the c values for deadlift rows are 13, 8, and 5, so the
> row with a c value of 8 should show up in the output.
> 
>         a          b          c
>      1        deadlift   7          13 
>      2        squat      7          24
>      3        clean      7          10
>      4        deadlift   8           8
>      5        squat      8          20
>      6        clean      8           2
>      7  deadlift   9           5
>      8  squat      9          32
>      9  clean      9          19
> 
> Result:
> 
>         a          b          c
>      4        deadlift   8           8
>      5        squat      8          20
>      3        clean      7          10
> 
> It's more complicated in my case, because I have not just one "a"
> column, but about eight columns that have to be the same.  I can do
> this with clumsy loops, but I wonder whether there's a better way.
> 
> -- 
> --Ed L Cashin            |   PGP public key:
>   [EMAIL PROTECTED]        |   http://noserose.net/e/pgp/
>

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] row selection based on median in data frame

Reply via email to