Dear R-list, I'm not sure what I've found about a function in DAAG package is a bug.
When I was using cv.lm(DAAG) , I found there might be something wrong with it. The problem is that we can't use it to deal with a linear model with more than one predictor variable. But the usage documentation hasn't informed us about this. The code illustrates my discovery: > library(DAAG) > xx=matrix(rnorm(20*3),ncol=3) > bb=c(1,2,0) > yy=xx%*%bb+rnorm(20,0,10) > > data=data.frame(y=yy,x=xx) > myformula=formula("y ~ x.1 + x.2 + x.3") > cv.lm(data,myformula,plotit=F, printit=TRUE) Analysis of Variance Table Response: yv Df Sum Sq Mean Sq F value Pr(>F) xv 1 37 37 0.29 0.6 Residuals 18 2288 127 fold 1 Observations in test set: 4 6 7 9 10 19 X1 X2 X3 X4 X5 X6 x.1 -0.0316 -0.342 -1.44 1.42 -0.446 0.042 Predicted -1.6335 -0.990 1.29 -4.64 -0.773 -1.786 y -16.7876 -25.954 -14.67 -2.29 -28.118 7.731 Residual -15.1541 -24.964 -15.96 2.35 -27.344 9.517 Sum of squares = 1951 Mean square = 325 n = 6 fold 2 Observations in test set: 5 11 12 14 15 16 20 X1 X2 X3 X4 X5 X6 X7 x.1 0.472 0.282 2.20 1.75 0.253 -0.0938 0.1543 Predicted -5.089 -5.385 -2.40 -3.10 -5.431 -5.9707 -5.5842 y -5.894 -8.855 -7.32 2.88 -16.414 -3.0530 0.0434 Residual -0.805 -3.470 -4.92 5.97 -10.983 2.9177 5.6276 Sum of squares = 233 Mean square = 33.3 n = 7 fold 3 Observations in test set: 1 2 3 8 13 17 18 X1 X2 X3 X4 X5 X6 X7 x.1 0.429 1.925 0.31 -0.0194 -1.45 -0.836 0.00308 Predicted -8.592 -0.873 -9.20 -10.9030 -18.28 -15.117 -10.78682 y 11.045 -8.562 6.64 -14.6833 6.95 0.873 1.41586 Residual 19.637 -7.689 15.84 -3.7803 25.23 15.990 12.20268 Sum of squares = 1751 Mean square = 250 n = 7 Overall ms 197 ######################################################## Note the model ("y ~ x.1 + x.2 + x.3") produces an Overall ms, 197 > myformula=formula("y ~ x.1 + x.2") > cv.lm(data,myformula,plotit=F, printit=TRUE) Analysis of Variance Table Response: yv Df Sum Sq Mean Sq F value Pr(>F) xv 1 37 37 0.29 0.6 Residuals 18 2288 127 fold 1 Observations in test set: 4 6 7 9 10 19 X1 X2 X3 X4 X5 X6 x.1 -0.0316 -0.342 -1.44 1.42 -0.446 0.042 Predicted -1.6335 -0.990 1.29 -4.64 -0.773 -1.786 y -16.7876 -25.954 -14.67 -2.29 -28.118 7.731 Residual -15.1541 -24.964 -15.96 2.35 -27.344 9.517 Sum of squares = 1951 Mean square = 325 n = 6 fold 2 Observations in test set: 5 11 12 14 15 16 20 X1 X2 X3 X4 X5 X6 X7 x.1 0.472 0.282 2.20 1.75 0.253 -0.0938 0.1543 Predicted -5.089 -5.385 -2.40 -3.10 -5.431 -5.9707 -5.5842 y -5.894 -8.855 -7.32 2.88 -16.414 -3.0530 0.0434 Residual -0.805 -3.470 -4.92 5.97 -10.983 2.9177 5.6276 Sum of squares = 233 Mean square = 33.3 n = 7 fold 3 Observations in test set: 1 2 3 8 13 17 18 X1 X2 X3 X4 X5 X6 X7 x.1 0.429 1.925 0.31 -0.0194 -1.45 -0.836 0.00308 Predicted -8.592 -0.873 -9.20 -10.9030 -18.28 -15.117 -10.78682 y 11.045 -8.562 6.64 -14.6833 6.95 0.873 1.41586 Residual 19.637 -7.689 15.84 -3.7803 25.23 15.990 12.20268 Sum of squares = 1751 Mean square = 250 n = 7 Overall ms 197 ######################################################## Note the model ("y ~ x.1 + x.2 ") also produces an Overall ms, 197 > myformula=formula("y ~ x.1 ") > cv.lm(data,myformula,plotit=F, printit=TRUE) Analysis of Variance Table Response: yv Df Sum Sq Mean Sq F value Pr(>F) xv 1 37 37 0.29 0.6 Residuals 18 2288 127 fold 1 Observations in test set: 4 6 7 9 10 19 X1 X2 X3 X4 X5 X6 x.1 -0.0316 -0.342 -1.44 1.42 -0.446 0.042 Predicted -1.6335 -0.990 1.29 -4.64 -0.773 -1.786 y -16.7876 -25.954 -14.67 -2.29 -28.118 7.731 Residual -15.1541 -24.964 -15.96 2.35 -27.344 9.517 Sum of squares = 1951 Mean square = 325 n = 6 fold 2 Observations in test set: 5 11 12 14 15 16 20 X1 X2 X3 X4 X5 X6 X7 x.1 0.472 0.282 2.20 1.75 0.253 -0.0938 0.1543 Predicted -5.089 -5.385 -2.40 -3.10 -5.431 -5.9707 -5.5842 y -5.894 -8.855 -7.32 2.88 -16.414 -3.0530 0.0434 Residual -0.805 -3.470 -4.92 5.97 -10.983 2.9177 5.6276 Sum of squares = 233 Mean square = 33.3 n = 7 fold 3 Observations in test set: 1 2 3 8 13 17 18 X1 X2 X3 X4 X5 X6 X7 x.1 0.429 1.925 0.31 -0.0194 -1.45 -0.836 0.00308 Predicted -8.592 -0.873 -9.20 -10.9030 -18.28 -15.117 -10.78682 y 11.045 -8.562 6.64 -14.6833 6.95 0.873 1.41586 Residual 19.637 -7.689 15.84 -3.7803 25.23 15.990 12.20268 Sum of squares = 1751 Mean square = 250 n = 7 Overall ms 197 ######################################################## Note the model ("y ~ x.1 + x.2 ") ALSO produces an Overall ms, 197 3 different linear model give three equal mss(mean squared error)!? Then I was eager to know why 3 different linear model gave three equal mss(mean squared error). I checked the code of function cv.lm(DAAG), then found the residues were all derived from a model with only one predictor. Is this a bug? Or is it because of my misunderstanding of somthing about cv.lm(DAAG)? Li Junjie -- Junjie Li, [EMAIL PROTECTED] Undergranduate in DEP of Tsinghua University, [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.