Dear R-users:

I have a large dataframe with the following format:

>plants

id      trt     year    size    num     spA     spB     spZ
                                                        
1011a   1       1       23.2    3       12      3.2     8
1011a   1       2       17.9    2       10      5.1     2.8
1011a   1       3       12.5    7       12      0       0.5
1011b   2       1       NA      NA      NA      NA      NA
1011b   2       2       6       6       4       2       0
1011b   2       3       100.3   5       3       95      2.3
28105a  1       1       9.1     8       0.5     0       8.6
28105a  1       2       16.6    4       2       12      4.6
28105a  1       3       8.7     7       1       0.2     7.5


I am looking for advice on how to select a subset of rows with
non-sequential id numbers, apply a series of functions to the subset
(excluding rows with missing data), and print the output to a new dataframe
containing the output from each unique id.  I need to perform the following
calculations on each subset of id numbers:

1) for all columns: mean and standard deviation and variance

2) for columns "spA" to "spZ": sum of the covariance matrix and sum of the
variance of each column

3) for columns "size" and "year": linear regression of form lm(size~year)


Ideally my new dataframes would have the following formats:

>plants.calc

id    trt   mean.size  sd.size  mean.num  sd.num  sum.spcovar  sum.spvar 
mean.spA  sd.spA  var.spA 

1011a  a    17.9       5.4      4.0       2.6     17.12        22.74     
11.33     1.15    1.33


>plants.lm

id      intercept   se.intercept   estimate     se.estimate     adj.Rsq   
Tvalue   Pvalue 
N

1011a   28.57       0.06           -5.35        0.03            0.9999    
458.09   0.0014  3


I am very new to R and have written the following code from which I can
successfully extract the summed covariance values but not anything else
because I cannot figure out, if possible, how to extract the relevant
columns from a list.  Any help you can offer would be greatly appreciated.

Thanks,
Claire.


n <-length(unique(plants$id))
output <-lapply(split(plants,plants$id),head,3)
out <-as.array(output)

sum.spcovar <-NULL
col.mean <-NULL
col.sd <-NULL
col.var <-NULL
sum.spvar <-NULL

for(i in 1:n){

     spcovar <-function(x) {colSums(var(x))}
     sum.spcovar[i] <- sum(spcovar(out[[i]]))

     col.mean[i] <-colMeans(out[[i]])
     col.sd[i] <-sd(out[[i]])
     col.var[i] <-(sd(out[[i]])^2)
     sum.spvar[i] <-sum((sd(out[[i]]))^2)

  }

plants.calc <-data.frame(unique(plants$id),
rep(1:2,length(uniqueplants$id)), sum.spcovar, 
sum.spvar, col.mean, col.sd, col.var)

-- 
View this message in context: 
http://www.nabble.com/Repeatedly-apply-multiple-functions-to-subsets-of-data.-tp16661991p16661991.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to