Hi, I a trying to compute scores for a new observation based on previously computed PCA by PCAgrid() function in the pcaPP package. My data has more variables than observations.
Here is an imaginary data set to show the case: > n.samples<-30 > n.bins<-1000 > x.sim<-rep(0,n.bins) > V.sim<-diag(n.bins) > mtx<-array(dim=c(n.samples,n.bins)) > for(i in 1:n.samples) mtx[i,]<-mvrnorm(1,x.sim,V.sim) With prcomp() I can do the following: > pc.pr2<-prcomp(mtx,scale=TRUE) > newscr.pr2<-scale(t(mtx[1,]),pc.pr2$center,pc.pr2$scale)%*%pc.pr2 $rotation The latter computes the scores for the first row of mtx. I can verify that the scores are the same as computed originally by comparing with > pc.pr2$x[1,] # that will print out the scores for the first observation Now, if I tried the same with PCAgrid() as follows: > pc.pp2<-PCAgrid(mtx,k=min(dim(mtx)),scale=mad) > newscr.pp2<-scale(t(mtx[1,]),pc.pp2$center,pc.pp2$scale)%*%pc.pp2 $loadings The newscr.pp2 do not match the scores in the pc.pp2 object as can be verified by comparing with: > pc.pp2$x[1,] I wonder what I am missing? Or is it so that for the grid method such computation of scores from the loadings and original observations is not possible? For the case p<n, i.e. when there are more observations than variables, the scores computed from loadings and the scores from the model object match also for the PCAgrid() method, i.e. the behaviour described above seems to relate to cases where p>n. Many thanks for any help, Kari ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.