TY - JOUR
T1 - Reevaluating excess success in psychological science
AU - van Boxtel, Jeroen J.A.
AU - Koch, Christof
PY - 2016/10/1
Y1 - 2016/10/1
N2 - Francis (Psychonomic Bulletin Review, 21, 1180–1187, 2014) recently claimed that 82 % of articles with four or more experiments published in Psychological Science between 2009 and 2012 cannot be trusted. We critique Francis’ analysis and point out the dependence of his approach on including the appropriate experiments and significance tests. We focus on one of the articles (van Boxtel & Koch, in Psychological Science, 23(4), 410–418, 2012) flagged by Francis and show that the inappropriate inclusion of experiments and tests have led Francis to mistakenly flag this article. We found that decisions about whether to include certain tests potentially affect 34 of the 44 articles analyzed by Francis. We further performed p-curve analyses on the articles discussed in Francis’ analysis. We found that 9 of 44 studies showed significant evidential value, 11 studies showed insufficient evidential value, and 1 study showed evidence of p-hacking. Our reevaluation is important, because some researchers may have gained the false impression that none of the quoted articles in Psychological Science can be trusted (as stated by Francis). The analysis by Francis is most likely insufficient to warrant this conclusion for some articles and certainly is insufficient with respect to the study by van Boxtel and Koch (Psychological Science, 23, 410–418, 2012).
AB - Francis (Psychonomic Bulletin Review, 21, 1180–1187, 2014) recently claimed that 82 % of articles with four or more experiments published in Psychological Science between 2009 and 2012 cannot be trusted. We critique Francis’ analysis and point out the dependence of his approach on including the appropriate experiments and significance tests. We focus on one of the articles (van Boxtel & Koch, in Psychological Science, 23(4), 410–418, 2012) flagged by Francis and show that the inappropriate inclusion of experiments and tests have led Francis to mistakenly flag this article. We found that decisions about whether to include certain tests potentially affect 34 of the 44 articles analyzed by Francis. We further performed p-curve analyses on the articles discussed in Francis’ analysis. We found that 9 of 44 studies showed significant evidential value, 11 studies showed insufficient evidential value, and 1 study showed evidence of p-hacking. Our reevaluation is important, because some researchers may have gained the false impression that none of the quoted articles in Psychological Science can be trusted (as stated by Francis). The analysis by Francis is most likely insufficient to warrant this conclusion for some articles and certainly is insufficient with respect to the study by van Boxtel and Koch (Psychological Science, 23, 410–418, 2012).
KW - Statistical inference
KW - Statistics
UR - http://www.scopus.com/inward/record.url?scp=84957999076&partnerID=8YFLogxK
U2 - 10.3758/s13423-016-1010-0
DO - 10.3758/s13423-016-1010-0
M3 - Article
AN - SCOPUS:84957999076
SN - 1069-9384
VL - 23
SP - 1602
EP - 1606
JO - Psychonomic Bulletin and Review
JF - Psychonomic Bulletin and Review
IS - 5
ER -