On Fri, 28 Aug 2015 11:18:21 -0700, Rick Froman wrote:
It is true that we only got a D+ :( but 68% is quite a bit higher :)
than some of the estimates of reproducibility I have heard
bandied about (which are close to zero).
Rick, I think you are manifesting a positivity bias. ;-) That is,
I think you use the wrong number (68%) to represent the results.
Below I quote the results section from the "structured abstract"
of the Science article linked to by Karl:
|We conducted replications of 100 experimental and correlational
|studies published in three psychology journals using high-powered
|designs and original materials when available. There is no single
|standard for evaluating replication success. Here, we evaluated
|reproducibility using significance and P values, effect sizes,
|subjective assessments of replication teams, and meta-analysis
|of effect sizes. The mean effect size (r) of the replication effects
|(Mr = 0.197, SD = 0.257) was half the magnitude of the mean
|effect size of the original effects (Mr = 0.403, SD = 0.188),
|representing a substantial decline. Ninety-seven percent of original
|studies had significant results (P < .05). Thirty-six percent of
|replications had significant results; 47% of original effect sizes
|were in the 95% confidence interval of the replication effect size;
|39% of effects were subjectively rated to have replicated the
|original result; and if no bias in original results is assumed,
combining
|original and replication results left 68% with statistically
significant
|effects. Correlational tests suggest that replication success was
|better predicted by the strength of original evidence than by
|characteristics of the original and replication teams.
The key points above, IMHO, are:
(1) 97% of the original studies had statistically significant results
while only 36% of the replications had statistically significant
results.
In other words, only (36/97 * 100=) 37% of the original significant
results were replicated (assuming none of the original nonsignificant
results became significant on replication).
(2) The original studies Effect Size (ES), represented by Pearson r,
had a mean r = 0.403 while the replication ES had a mean r= 0.197.
The replicated ES was less than half of the original ES.
(3) By subjective evaluations, only 39% of the original effects were
replicated.
The full article is available at the following URL:
http://www.sciencemag.org/content/349/6251/aac4716.full
The suggested citation for the article is:
Open Science Collaboration (28 August 2015). Estimating the
reproducibility of psychological science. Science : 349 (6251),
aac4716 [DOI:10.1126/science.aac4716]
Brian Nosek is identified as the corresponding author and there
appears to be about 70+/-20 other authors (I didn't count) representing
125 institutions (Science Mag did the counting).
So, 68% is one way to represent the results or one could say that
only about 39% of the original findings were replicated. YMMV.
-Mike Palij
New York University
[email protected]
P.S. I have skimmed the full article and what I refer to above as
Pearson r may be what the authors call "Spearman's rank-order
correlations of reproducibility". Have to read the full article to
determine what the difference is.
------------Original Message----------------
On Friday, August 28, 2015 12:59 PM, Karl Louis Wuensch wrote:
This article is going on the reading list for my grad
students
in stats. Once we have covered power and publication biases this
article should, I hope, lead to some lively discussion.
http://www.sciencemag.org/content/349/6251/aac4716
---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here:
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=46471
or send a blank email to
leave-46471-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu