I must admit that I don't understand some of the comments that have
been made about the Button et al article and what they were trying
to show.  I'll to summarize some of the main points and try to address
what I think some people are questioning.

Main points:
(1)  Button et al conducted a "Meta-meta-analysis", that is, they
examined the literature in neuroscience for meta-analyses and out
of the 256 reports they found, 48 met their inclusion criteria.  In
these 48 studies they summarize the number of studies used in
each meta-analysis (i.e., k), the N of subjects used, an effect size
measure (either Cohen's d or an Odds Ratio), whether fixed
effects analysis (i.e., constant effect size across studies) or random
effects analysis (i.e., effect size is a sample of effects) was used,
and, here is their original contribution, post hoc or "observed"
power analyses were conducted on the studies that were reported.
See Button et al's Table 1 for the list of sources and the aforementioned
values.

(2) Button et al found that the "average" (unclear if they are using the
mean or median) power of the studies in the meta-analyses was .21.
This can be considered the most important point of the article.
Curiously, the distribution of power values was bimodal, with one
group of studies bunched up at the lower end of power values and
seven studies that had power about .90.  See their comments about
this.

(3) An additional test that Button et al conducted was for "excess
of statistical significance", a procedure that had been used previously
in biomedical research (see refs 70, 72, 73, and 74) and by Francis
in psychology (search Tips archive for my mention of Francis
work on Bem's experiments and studies with disappearing effects).
This test was significant (see page 5, column 1, 5th para) indicating
that there were more statistically significant results then expected
given the amount of power and the effect sizes.

(3) Button et al noticed that the meta-analyses that they used did not
include studies using neuroimaging or involved animals and they then
located studies in these areas to analyze.  For neuroimaging, see
ref 73 for details but here too they found median power of .08
across 461 individual studies.  For the animal studies, see their
Table 2 for the sample sizes needed to have power= .80 & .95 in
water maze and radial mazes studies -- all actual samples were
too small to detect their effects.

So, summarizing the above, Button et al conducted a post hoc
power analysis of studies in neuroscience, comparable to other
studies done in the biomedical field (the critical co-author is
John Ioannidis who was instrumental in developing this type
of analytical procedure), and found that the average power= .21.
This is smaller than what Cohen found in his examination of
power in psychology (Gigerenzer replicated Cohen's analysis
decade later and found little difference) and leads to the conclusion
that there is something wrong with the conduct of research in
neuroscience -- not unique to neuroscience because Ioannidis
have shown this to be the case in other biomedical fields (see:
http://www.mail-archive.com/tips@fsulist.frostburg.edu/msg07472.html )
but the power is lower here.

There seems to be some questions about some of the assertions that
Button et al make.  They begin their article with three points which
I'll call "Secondary Points":

Secondary Points:
(1)  Low statistical power reduces the probability of detecting true
non-null results.  As they say on page 2, if has power= .20, then
out of 100 studies where the null hypothesis is false, one will only
reject the null in about 20 studies.  It seems to me that this is not
a controversial point.

(2) The lower the power of a study, the less likely one is to obtain
a significant result when in fact the null is false.  Button et al
refer to this a "predictive positive value" (PPV) and it may make
more sense when examined in its original context, that of making
a diagnostic decision, say, with a lab test.  Wikipedia has a
useful example of PPV in the lab text context and the key thing
to notice is that it includes the prevalence in its calculation.
For example, if you are testing for, say, HIV, the probability
that a person is infected depends upon number of people in their "group" has it (i.e., the prevalence of illness in the group).
If we know that we are testing a gay male, that person has
a high prior probability of being infected, so would expect
a positive result.  If we know that we are testing a lesbian,
that person would have a much lower probability and a
positive result might be considered a false positive.

Button et al modify the procedure described in Wikipedia
(and most books on epidemiology; see Jekel's text for a
more complete presentation -- look on books.google.com)
and define PPV as
PPV = ((1 - Beta)*R)/(1-Beta)*R*alpha

where alpha is the Type I error rate (usually = .05)
Beta is the Type II error rate, 1 - Beta is statistical power, and
R is prior odds of having a statistically significant result
(the text is a little confusing because R is referred to
as an odds ratio early on but this is wrong, it is simply
the prior odds).

The prior odds can be calculated from a meta-analysis
since the meta-analysis will indicate how many studies
out of the total are statistically significant.  I'd like to point
out that this is mainly a technical point and not really essential
to understanding the main point behind the article.

(3)  When an "underpowered" study is statistically significant,
the effect size will usually be exaggerated especially in small
samples, a result referred to the "winner's curse".  The real
problem here is that replications of the study that use the
same sample size as the initial study are likely to
produce nonsignificant results because the true effect size is
smaller than the estimate provided in the initial study.
I leave it to the interested reader to review Button et al's
presentation of this point on page 2-3.

Now, notice that the above three points could in fact be ignored
because the key result is the meta-meta-analysis finding
that power = .21.  The above points explain Button et al's
rationale but if has difficulty with it, that should not get in the
way of the key result.  It should also be noted that this is just
one paper in a series on "meta-meta-analysis" -- see my
post to Tips that I provide at link to above.

I think most people have had problems with the three points and
not with the actual meta-meta-analysis.  Remember that one can
always replicate their analyses if one is really concerned.

-Mike Palij
New York University
m...@nyu.edu



---
You are currently subscribed to tips as: arch...@jab.org.
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=24931
or send a blank email to 
leave-24931-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Reply via email to