According to meta-analysis reported in the New York Times, many of the psychology findings that one finds proliferating across social media and casual conversation may be less solid than initially reported. After one year trying to reproduce the results of 100 studies published in three leading psychology journals — Psychological Science, the Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory, and Cognition — the researchers found that more than half failed to hold up when re-tested.
The analysis was done by research psychologists, many of whom volunteered their time to double-check what they considered important work. Their conclusions, reported Thursday in the journal Science, have confirmed the worst fears of scientists who have long worried that the field needed a strong correction.
The vetted studies were considered part of the core knowledge by which scientists understand the dynamics of personality, relationships, learning and memory. Therapists and educators rely on such findings to help guide decisions, and the fact that so many of the studies were called into question could sow doubt in the scientific underpinnings of their work.
The project began in 2011, when a University of Virginia psychologist decided to find out whether suspect science was a widespread problem. He and his team recruited more than 250 researchers, identified the 100 studies published in 2008, and rigorously redid the experiments in close collaboration with the original authors.
The new analysis, called the Reproducibility Project, found no evidence of fraud or that any original study was definitively false. Rather, it concluded that the evidence for most published findings was not nearly as strong as originally claimed.
Dr. John Ioannidis, a director of Stanford University’s Meta-Research Innovation Center, who once estimated that about half of published results across medicine were inflated or wrong, noted the proportion in psychology was even larger than he had thought. He said the problem could be even worse in other fields, including cell biology, economics, neuroscience, clinical medicine, and animal research.
Granted, even the study that found these other studies to be weak (e.g. not necessarily false) has been called into question, both in its execution and conceptually.
“There’s no doubt replication is important, but it’s often just an attack, a vigilante exercise”, said Norbert Schwarz, a professor of psychology at the University of Southern California.
Dr. Schwarz, who was not involved in any of the 100 studies that were re-examined, said that the replication studies themselves were virtually never evaluated for errors in design or analysis.
Dr. Nosek’s team addressed this complaint in part by requiring the researchers attempting to replicate the findings to collaborate closely with the original authors, asking for guidance on design, methodology and materials. Most of the replications also included more subjects than the original studies, giving them more statistical power.
Strictly on the basis of significance — a statistical measure of how likely it is that a result did not occur by chance — 35 of the studies held up, and 62 did not. (Three were excluded because their significance was not clear.) The overall “effect size”, a measure of the strength of a finding, dropped by about half across all of the studies. Yet very few of the redone studies contradicted the original ones; their results were simply weaker.
“We think of these findings as two data points, not in terms of true or false”, Dr. Nosek said.
For their part, the researchers behind the analysis agree that their finding is by no means an airtight, total refutation of the weak studies.
The project’s authors write that despite the painstaking effort to duplicate the original research, there could be differences in the design or context of the reproduced work that account for the different findings. Many of the original authors certainly agree.
In an email, Paola Bressan, a psychologist at the University of Padua and an author of the original mate preference study, identified several such differences — including that her sample of women were mostly Italians, not American psychology students — that she said she had forwarded to the Reproducibility Project. “I show that, with some theory-required adjustments, my original findings were in fact replicated,” she said.
These are the sorts of differences that themselves could be the focus of a separate study, Dr. Nosek said.
So as usual, there is still a lot of work to be done. There is talk about applying this sort of analysis to other fields of science, which have similarly faced an increasing rate of retractions, leading to an increase of scrutiny over science as a whole (in turn part of a wider trend of growing mistrust towards all sorts of institutions and authorities, from media and politics, to religion and medicine).
In my view, such seemingly cynical developments are a sign of progress. The pursuit of science in its organized form is still a rather new development in human history, and there doubtless much more work to be done in refining how we study and understand the world around us (especially such complex systems as those of psychology, economics, and physics). Given the fallibility, bias, and corruptibility of humans (and their subsequent institutions), such flaws will always be present with us.
The key is to be as vigilant and transparent as possible, allowing for constant scrutiny, analysis, discussion, and introspection (there is a reason why the best scientists are often cautious and low-key about their initial findings). It is far from foolproof — what human endeavor isn’t? — but it is the best we can do in any field of study.
For more on this issue, check out The Atlantic’s equally excellent piece.