On Mon, 27 Jun 2016 13:53:03 -0700,  Christopher Green wrote:
VoilĂ ! :-)

With all due respect to Chris Green, is it possible that the
anti-Null Hypothesis Statistical Testing (anti-NHST) folks
come up with a new, possibly useful criticism that (a) hasn't
been made numerous times by many people, (b) has been
shown to be wrong but counter-arguments have been ignored,
and (c) propose new methods (e.g., confidence intervals) that
are actually worse than the problems they are supposed to
solved?  Really, folks, this is getting tiresome.

There is one piece of nonsense in the article that I have to
address.  Let me quote:

|Meehl contrasted this with the situation in physics, where
|researchers set out in advance to test a specific result that
|they expect their data to reflect (rather than being satisfied
|with any effect that is "not zero").

Structural equation Modeling (SEM) existed in Meehl's time --
even earlier in the form of path analysis -- and the hypothesis
that it tests is not that an effect is "zero", rather, it tests whether
the specified relationships in the model adequately fit
some criterion, such as a pattern in a covariance matrix
(that is, the sample covariance matrix based on the model does
not differ significantly from the population covariance matrix -- the chi-square test examines the discrepancies between
the two and if the model fit, the chi-square statistics should be
close to its degrees of freedom not zero, as well as meeting other
criteria for goodness of fit).  Meehl and other anti-NHST folks
seem to be unaware of this though one has to give Meehl
a break because he probably didn't know about maximum
likelihood factor analysis (the form of SEM that was around
1967).  The other folks should get on the SEMnet mailing list.


SEM is often used with nonexperimental data but it can be
used to test data from experimental studies; it often doesn't
make sense to do this because (a) the independent variables
are usually measured without error (there is no need for a
measurement model for the IV; people doing regression analysis
with nonexperimental predictors have to figure out how to deal with the
"errors in the predictors" problem which violate one of the basic
assumptions of multiple regression).

Whether a researcher tests a hypothesis that involves either the
comparison of two or more means or a the correlation among a
set of variables or whether a particular model fits a covariance
pattern or accounts for the nested effects of say, students within
classrooms within schools within districts, as done in multilevel
analysis, well, all of these depend upon the theory or hypotheses
that one is attempting to test.  If one assumes that the research
question or the research design used to address that question
determine what statistical analysis one will do,  I guess it's
easier for some people to complain the statistical analyses that
are conducted instead of the lousy theories they are testing
because of their ignorance of the phenomenon that they are
studying -- isn't that the real problem?

Thus it gets more difficult for them to confirm the theory as measurements
get better.

First, "confirm" seems to imply a "verificationist" perspective,
that is, the results of a single analysis "prove" that a theory
is true.  Do I really need to explain how naive this is (perhaps
one needs to review their Popper or better understand how
Modus Ponens and Modus Tollens operate in evaluating
evidence for a hypothesis/theory?).  Second, let's assume
a simple measurement model for a dependent variable Y:

Y = mu + com + spec + u

where mu = the population mean of the probability distribution of Y,
com = refers to the portion of Y that is shared with other Y variables,
that is, is a systematic component and when Y is converted into a
variance it is "common variance" (possibly R^2, the correlation that
this Y variable has with other Y variables)
spec = specific variance, that is, a systematic variance that is limited
to this Y and a small number of other y's (if Y refers to performance on
an academic test, com would reflect "g" or general intelligence while
spec would reflect knowledge of the specific areas being tested, say,
math test performance -- other Y might measure verbal ability or
spatial ability and ability in these areas would be expressed as
specific variance), and
U = unique systematic component and e that is a random error component.
In u, the systematic component may refer to a systematic effect such as
conditions of testing, time of day, etc., that affect the person's Y value. "e" is random error that follows some probability distribution and can provide
a reference for determining error variance.

In the classic one way two level between-subjects design, we would
add a component to the measure, let's call it alpha where alpha =
mu1 - mu2
where
mu1 = pop mean associated with Tx 1 or placebo,
mu2 = pop mean associated with Tx 2 or treatment.

Given the above, can someone explain how better measurement would
make confirming, or more correctly, disconfirming the null hypothesis,
be made more difficult?

Oh, just a couple points on Meshl (1967):
(1) Meehl engages in what Gerd Gigerenzer would claim is a presentation
of the "hybrid model of statistical inference", that is, mixing Fisherian
inference and assumptions with Neyman-Pearson inference and assumptions.
On page 107 when Meehl talks about statistical power, he apparently
fails to realize that Fisher never believed in the concept of statistical
power because he did not believe in the statistical inference framework
of Neyman-Pearson.  Can't have a Type II error if you don't believe in
them.  Consequently, you can't have Statistical Power.  Thus, Meehl's
claim that "little emphasis has been placed on statistical power" indicates
that Meehl does not understand that if you are a Fisherian, you wouldn't
talk about it. The real question that Meehl should address is why people
would accept the Neyman-Pearson inference approach instead of the
Fisherian.  P

(2) Also on page 107, Meehl refers to Jack Cohen (1962) review of the
statistical power of studies published in the Journal of Abnormal and
Social Psychology.  I guess that Meehl (as well as Jack) can be forgiven
for not distinguishing between "retrospective power analysis" and
"prospective power analysis" because though this distinction was obvious,
it was not appreciated the retrospective power analyses are not really
valid:  for one argument on this point, see:
Hoenig, J. M., & Heisey, D. M. (2012). The abuse of power.
The American Statistician.

Jack had advocated that researchers do a prospective power analysis,
that is, identify the effect size one wants to detect, how much statistical
power one wants, and what sample sizes will be needed (when comparing
two independent means, a fourth component needs to specified, but that
is the Type I error rate which is usually set a alpha= 0.05). Unfortunately,
many researchers even in this situation will not know which relevant
population distribution to use (default: assume normal), what effect size
to expect, what is an acceptable level of statistical power (.80? .90?
.95? .99?) or how many subjects/participants one might be able to get.

As should be evident, the problem is not that a t-test will be used and
one will test the null hypothesis of zero difference, the problem is that
most researchers don't know about the phenomena there are research
to make good decisions about them (this is particularly problematic in
the Neyman-Pearson framework).  *THAT* is the flaw at the heart of
psychological research.  But some people feel compelled to beat
dead horses even when their dust has blown away.

Self-promotional bit:  for more on use of new statistical rituals
(as well as objections) in place of old rituals, see my review of
Geoff Cumming's book "The New (Sic!) Statistics"; see:
https://www.researchgate.net/publication/236866116_New_statistical_rituals_for_old


On Jun 27, 2016, at 5:49 PM, Rick Froman wrote:
Guess which TIPster has the first article in the Chronicle Review this week which includes a clear and concise look at the state of null hypothesis
testing in psychological research?

http://chronicle.com/article/The-Flaw-at-the-Heart-of/236916

NOTE: this is behind a paywall.  If your library doesn't subscribe,
I'm sure that Chris would be more than happy to supply a reprint. ;-)
Or email me. ;-)

-Mike Palij
New York University
m...@nyu.edu

P.S.  I really do like Chris Green, I just treat him badly. ;-)

P.P.S. I did find Meehl going Bayesian on page 111 amusing ;-)


---
You are currently subscribed to tips as: arch...@mail-archive.com.
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=48925
or send a blank email to 
leave-48925-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Reply via email to