Linear Correlation Withe Errors in Variables

David A. Heiser Tue, 18 Jan 2000 20:55:38 -0800

My comment “not seeming to be right” regarding what you originally wrote, comes from being familiar with “Orthogonal Regression”. I would recommend you read the article by Carroll and Ruppert, “The Use and Misuse of Orthogonal Regression in Linear Errors-in-Variables Models” in The American Statistician, Vol 50, No.1, Feb 1996, p 1. The other article you should read is the one by Tan and Iglewicz, “Measurement-Methods Comparisons and Linear Statistical Relationship”, in “Technometrics, Vol 41, No 3, August 1999, p 192. Both deal with laboratory type measurements. The second deals with your problem of a second method of measurement in comparison to an older method. The first article also discusses a method of estimating the variance due to equation (linear) error.

Statistics is nothing more than a scientific method of determining if some observations and deductive conclusions came about by pure chance (After R.A. Fisher). The design of experiments (which is what you should be doing) is a powerful tool to reduce the effects of chance on the observed data. At some point a model (or several models) that involves the observations has to be constructed. For example, this would be the linear regression between two observed variables. One of the models then can be taken to be true as the null hypothesis, and statistics used to determine if the model is likely to have occurred by chance or not. Note that statistics does not tell you which of the many models that fit are true, it only tells you which ones are probably false.

My reference to gurus is about the very many models and methods of data reduction (and individuals associated with them) to a statistic, which can be found to not be false. In many cases the models and associated statistics are completely different, and result in different numerical results. It could very well be that log transformed data fit just as well as untransformed data (neither rejected). The designed experiment is needed to try and more clearly distinguish between the different models.

The correlation coefficient between X and Y can be directly obtained from the covariance matrix. The two diagonals are the observation variances (not the error variances) and the off diagonal term is the covariance. The correlation coefficient can be either calculated or determined graphically by plotting the “cloud” of X-Y points, and drawing the regression estimate of slope through it. Turner in “A Simple Example Illustrating a Well-Known Property of the Correlation Coefficient”, The American Statistician, Vol 51, No 2, May 1997, p 170, shows the form, and the relationship between the covariance and correlation coefficient.

Professor Jan de Leeuw pointed out that Francis Galton observed an elliptical cloud of points on male height data (Father and Son) back in the 1880’s and J.D. Hamilton Dickson reduced it to a correlation coefficient in 1886. So the technique is not new.

The methods of Total Least Squares which a set of very power-full analytical tools have been investigated by the numerical analysis community. The statistical communities have not yet worked out the distributional properties of the resulting error estimates. For example you can do multivariate regressions involving many simultaneous Y values (such as absorption (Y) versus wavelengths and specimen compositions (X)).

DAHeiser

Linear Correlation Withe Errors in Variables

Reply via email to