A new paper adds to the continuing discussion of research practices in psychology. The paper (citation below), in press at Psychonomic Bulletin and Review by Gregory Francis, analyzes the last several years of published papers in Psychological Science, a premier outlet in psychology, and in essence asks if there is “too much success” in the reported studies.
The analysis uses the “test for excess significance” (TES) (Ioannidis & Trikalinos, 2007). The intuition is that if you run a number of experiments – N – measuring an effect of a certain size, then it is possible to compute how likely getting N rejections of the null is for all N experiments. So, if the odds of getting an effect are, say, one in three, given the power to find the effect, then the chances of getting two such effects is the product of this probability, or one in nine. If one finds more successful rejections of the null than one would expect, given the power to reject the null, this suggests that something is amiss. From the analysis alone one can’t say where the excess success comes from, only that there is a bias in favor of positive results. According to Francis, the cutoff for the value of the TES is 0.1. As he puts it: “A power probability below this criterion suggests that the experiment set should be considered biased, and the results and conclusions are treated with skepticism.”
He ran the TES on published papers in between the years 2009 and 2012 (inclusive) that reported four studies or more – the minimum number for the TES analysis – and found that 82% of the 44 paper that met the inclusion criteria had values less than the cutoff value, suggesting a substantial degree of “excess success” in the journal.
I’m confident that the paper will stimulate a great deal of discussion. My interest for the remainder of this post is in a possible pattern in the TES data. When I first read the paper, my eye was caught (slightly ironically) by the short title of one of the papers investigated, The Wolfpack Effect, by my friend Brian Scholl and colleagues, which I wrote a little post about around the time it came out. This paper was one of the eight that surpassed the .1 threshold.
I looked a bit more closely into some of the others that similarly had TES values above .1. The largest TES value.426, was also in the area of perception, looking at how people can quickly assign stimuli to categories (e.g., “animal”). The next largest TES value.348, was another perceptual study, having to do with the way that objects are represented in the visual system. Two other papers had to do with, first, another effect in vision – how the color of light affects visual processing of fear-inducing stimuli – and, second, an effect in audition.
So five of the eight successes, as indexed by TES, are from the field of perception. The other three were not, having to do with predictors of subjective well-being, reducing prejudice, and appreciation of others’ help. One paper in the area of perception – about visual rivalry – didn’t fare as well. Neither did a paper looking at the possibility that people see objects they want as being closer to them.
So perception didn’t run the table, but, still, without looking very closely at all the papers in question, it seemed to me that the low-to-medium level perception work distinguished itself in the analyses. (I might add that another paper I (TES = .036) purported to show that when “religious themes were made implicitly salient, people exercised greater self-control, which, in turn, augmented their ability to make decisions in a number of behavioral domains that are theoretically relevant to both major religions and humans’ evolutionary success.”
In any case, from the results that Francis reports, I don’t think any strong inferences can be drawn. To my eye, it looks like perceptual work does better than the other areas, but more systematic work will need to be done.