?scale Bert Gunter Genentech Nonclinical Biostatistics
-----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri Liakhovitski Sent: Tuesday, March 30, 2010 8:05 AM To: r-help Subject: [R] Code is too slow: mean-centering variables in a data frame bysubgroup Dear R-ers, I have a large data frame (several thousands of rows and about 2.5 thousand columns). One variable ("group") is a grouping variable with over 30 levels. And I have a lot of NAs. For each variable, I need to divide each value by variable mean - by subgroup. I have the code but it's way too slow - takes me about 1.5 hours. Below is a data example and my code that is too slow. Is there a different, faster way of doing the same thing? Thanks a lot for your advice! Dimitri # Building an example frame - with groups and a lot of NAs: set.seed(1234) frame<-data.frame(group=rep(paste("group",1:10),10),a=rnorm(1:100),b=rnorm(1 :100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1: 100)) frame<-frame[order(frame$group),] names.used<-names(frame)[2:length(frame)] set.seed(1234) for(i in names.used){ i.for.NA<-sample(1:100,60) frame[[i]][i.for.NA]<-NA } frame ### Code that does what's needed but is too slow: Start<-Sys.time() frame <- do.call(cbind, lapply(names.used, function(x){ unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T))) })) Finish<-Sys.time() print(Finish-Start) # Takes too long -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.