On Jun 21, 2010, at 10:27 AM, David Riebel wrote:

I am using the lm function in R to fit several linear models to a
fair-sized dataset (~160 collections of ~1000 data points each).  My
data have intrinsic, systematic uncertainty much greater than the
measurement errors on any individual point.  My thought is to use the
residuals of my linear fits to quantify this intrinsic uncertainty, but
I am puzzled over the correct interpretation of R's output.

I have attached plots of the fit and the residuals to one of my
sub-groups, for illustration. By eye, the overwhelming majority of the
residuals are within +- 0.4, and I would therefore expect the standard
error of the residuals to be ~0.2. However, the output from lm does not
show this:

Crack open a basic regression text. The standard error (more completely, the standard error of the estimate) refers to the parameter, not the residuals. It will depend on SS(resid)/(n), but there are obviously other components in the calculation. Furthermore, you have complicated matters by adding a weights term which will affect your estimates in a manner that we cannot predict since you did not provide the full data.


summary(ofit)

Call:
lm(formula = omag ~ oper, weights = (1/oerr))

Residuals:
    Min       1Q   Median       3Q      Max
-3.32185 -0.41181  0.03983  0.40041  2.52971

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.52847    0.03979   490.8   <2e-16 ***
oper        -4.25297    0.02101  -202.4   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6705 on 2287 degrees of freedom
Multiple R-squared: 0.9471, Adjusted R-squared: 0.9471
F-statistic: 4.097e+04 on 1 and 2287 DF,  p-value: < 2.2e-16

The plot thickens when I examine the residuals themselves:
summary(resid(ofit))
    Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
-0.611800 -0.095720  0.010200  0.005954  0.101100  0.680700
sd(resid(ofit))
[1] 0.1533568

These numbers are much more what I see by eye. There really aren't any
residuals outside ~0.6, certainly nothing as large as 3.3!  The help
feature for lm tells me that the residuals are "the residuals, that is
response minus fitted values."  Exactly what I would expect.  As an
Astronomer, my knowledge of statistics is rather "workman-like" if you
will, but to me, "Residual standard error" means "the standard deviation
of the residuals," but the lm output doesn't seem to agree with this.

Probably because you added the weights argument.


I'd appreciate it if someone could clarify what's being output by the
summary function acting on an lm object.

Replies by e-mail preferred.

Thanks,


David Riebel
Graduate Research Assistant
Johns Hopkins University
Department of Physics and Astronomy

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to