The "Replication Crisis" (RC) is this nebulous term that gets thrown around by anyone looking to cast shade on science. Don't get me wrong, it is used by a lot of legitimate thinkers as well, but too often I see it used as a rhetorical tool to try to discredit scientific knowledge. The unspoken premise of these types of arguments is often, "if one pair of experiments found different results then all of science is a lie and you should buy this totally legitimate homeopathy cure. It's real cheap and will only cost you your entire life savings. What a deal!" Well obviously that's my own little bit of hyperbole, some thinkers latch onto the RC as a means to question knowledge, which in and of itself is an endeavour I can get behind. However, I think it is important not to over reach when discussing the RC, and always finish with the reminder that while science is (by design) unlikely to ever be more than a useful abstraction of concrete reality (I need to write something on Whitehead and The Fallacy of Misplaced Concreteness), it has proven to be more reliable than other methods of knowledge gathering. Not perfect by any means, just the most practical that we currently have. If we accept that premise and acknowledge that the RC may not be the catastrophic-end-of-days-apocalypse that some misinformed interlocutors would have you think, is it still a significant problem? I want to say yes, and part of me writing this is so I can put together a reasoned position for why.
In its purest form, the RC can be explained as the experimental results of one study being unable to be replicated by a later study. This is the easily explained replication part of the discussion. What makes it a crisis is the notion that this is occurring more often than is expected and/or accepted by the scientific community. That's not to say that every single experiment ever is failing to be replicated, just more than the community would like.
If you've read anything else on this site you may know that my tertiary education has primarily focused on neuroscience, psychology and philosophy. It should then come as no surprise that I am focused on how the RC is manifesting in psychology, so much so that my master's thesis is embedded in exploring this further. Anyone with five minutes on Google will quickly be inundated with articles shredding the entire scientific field of psychology (yes it is a science; I think I've already promised to write an article on how I argue defend psychology as a science) as dissolving before our very eyes due to everything ever being irreplicable. Again, a bit of hyperbole, but not too far from what I've actually read. The thing is, I've seen some writers use the RC as a label when only one experiment has failed to replicate a prior. Let me be clear, to me, using the term RC to describe a single pair of experiments is preemptive to the point of irrelevance.
Simply knowing that two experiments found different results, by itself, does not a crisis make. If this is all that is known about the two experiments, then it is also invalid to infer anything else beyond that they found different results. It requires further analysis before any additional conclusions can be drawn. The first question that needs to be asked is if the experiments were as close to identical as possible? For example, let's consider a hypothetical experiment testing how many numbers a person can remember at once. Experiment 1 found that the average number was 11, but Experiment 2 found that the average number was 23. Right away we can conclude that the second experiment failed to replicate the findings of the first. From the little we know, that is a fair and reasonable claim. Before we can assert anything else though, we need to consider how similar the experimental conditions were.
First, I would suggest it important to look at who was involved. In our hypothetical, Experiment 1 was conducted by Odutalin Graytank and Experiment 2 was conducted by Lyrei Heidi. Both imaginary researchers are qualified with quality track records, so it may be instantly assumed that we can move on to other factors. However, as Daniel Kahneman has argued, no two researchers think identically. It is reasonable to question if two researchers with the same socio-cultural background are performing experiments inline with the same paradigms. In a ever growing multicultural scientific milieu it may be irresponsible to conclude that both researchers used identical definitions and techniques.
Second, what experimental techniques were used? If we look at both researchers papers and see their methodology reads more or less the same, some might assume that it was. But, as was suggested by the first issue, and the philosophy of language as a whole, their is no guarantee that the two researchers were sharing identical definitions for every word and phrase. The individual definitions may be close enough that the researchers appear to be performing identical experiments, but before any further conclusions can be drawn, it is worth confirming.
Thirdly, what about the thing being experimented on? In a simpler experiment were all that was being done was testing the acceleration of a 1kg ball of lead in a vacuum when dropped from 2km above sea level, there are already countless additional variable that would need to be accounted for. For example, since a perfect vacuum is more theoretical than practical, how close were the differing conditions? How was the vacuum achieved? If the results were different, it may also be worth considering other things that may have influenced the degree of acceleration due to gravity. Were both experiments done in the same place? Were was the moon at the time of each experiment? And so forth. I say this is a simpler experiment than the original example concerning human subjects, because something like gravity is so complex that current human understanding has barely scratched the surface, yet human cognition is at the very least just as complex. I would argue, even more complex. So, when an experiment is being performed on human cognition we need to pay at least as close attention to the confounding variables as we would in a physics experiment.
And so forth.
At this point, I feel I should point out here that I'm not suggesting every scientific experiment in all disciplines needs to account for all possible semantic and methodological variables. If that was the expectation then nothing would ever get done. This is more a concern for when replication fails. If two experiments found the same results, then for the sake of practical progress it may be sufficient to conclude that they were close enough. When replication fails, before a witch-hunt is called, I propose that these variable conditions are worth examining further.
With all this laid out, I hope that my point is clear. If not let me reiterate it. A single pair of experiments with different results is not a crisis.
However, I will argue that a pattern of single pair replication failures is a crisis. Not necessarily a crisis of the results, but perhaps a crisis of interpretation? This is where I still need to do some more research and get my thoughts together a bit more. The crisis as I currently see it may be more to do with communication and interpretation rather than poor experimentation. I feel like different things I have read from thinkers like Ioannidis, Abend, Chalmers, Kahneman, etc. have all contributed to my thinking this. But, it would be arrogant not to consider that it's possible what I've concluded from their writings may be naive to the point of invalid. The next post in this series will be after I have actively tried to create a detailed map of the characteristics of the broader RC and the RC as it specifically manifests in psychology. Hopefully by then I will be able to give a more coherent and academic description of the RC with less of my own personal intuition.