The “file drawer” problem refers to the tendency of scientists to consign “negative data” (or “negative results”) to dusty file drawers and forget about them. This practice, we’re told, is widespread, distorts the scientific literature, and has major adverse consequences. The practice of sitting on a lot of negative data heavily skews the published literature towards positive results, thus contributing to publication bias. This creates pressure on scientists to report only positive data, and the cycle continues to the detriment of an honest picture of scientific knowledge.
The file drawer problem is undoubtedly genuine, although it is hard to gauge how extensive it is. Still, fostering the growth of counteracting forces to reward the publication of certain negative data would be a very good thing. In particular, if we recognize the central place of the hypothesis in scientific reasoning and the critical role of negative data plays in testing hypotheses, we’ll hold negative data in higher esteem than we now do. Yet, unqualified calls for the publication of negative data merely muddy the waters, making a solution to the problems more difficult. Blanket assertions about file drawers and negative data stem from a mistaken view of science, and clarification of a few terms should be helpful. I will argue that, while we should demand that much negative data be made accessible, we shouldn’t be too worried when other data are slipped into file drawers, because that’s where they probably belong. How can we separate the wheat from the chaff?
First off, what does the phrase “negative data” mean? Unfortunately, it has been indiscriminately applied to two quite distinct classes of results: those that permit conclusions to be drawn from them (I’ll call these actionable; serving as the basis for a decision or action, not necessarily a lawsuit) and those that do not (non-actionable). Let’s start with the non-actionable results, because they are the most intuitively obvious. For a variety of reasons, including unexplained variability, uncontrolled noise, inadequate test design or methods, insufficient group sizes, etc., these data admit of no firm conclusions whatever. You just can’t say anything about them. Suppose you’re studying the effects of drugs on rats’ ability to learn to run a maze. One day, on a whim, you run a small pilot experiment where you test Drug X on 6 rats. The results are entirely non-actionable: the performance of two rats got better, one got worse and three were unchanged. You tuck the results into a file drawer and forget about them. I suspect that the vast majority of file-drawer-worthy negative data are of this sort. What about the argument, often implied if not stated, that there should be no such data; that we should always strive to make every result rigorous and worth of publication? This is neither possible nor desirable, and thinking it is misunderstands how science is conducted. We’ll return to this topic shortly.
Turning to actionable results, I want to emphasize, first, that such results are not absolutely certain or unambiguously true; we’re talking about science in the real world here, where nothing is ever 100% certain. Actionable results are simply those that allow reasonably convincing conclusions to be drawn from them. This means that they were obtained using good experimental design principles and methods, and the results appear to be solid.
Actionable negative data come in at least two flavors; a) those showing that the experimental treatment or condition had an effect that was contrary to the “positive effect” that had been expected or predicted, and b) those showing that the treatment had no effect (meaning, of course, no “statistically significant” effect, a subject for another discussion). In the first case, the data are defined as “negative” simply because they are not the “positive” ones expected; something happened, it just wasn’t what you predicted would happen. Let’s say that Drug X had been reported to improve rats’ maze-learning ability, and your data demonstrated that it worsened their ability to learn. Contradictory data like this are often referred to as being negative. On the other hand, maybe you found that a purportedly effective experimental treatment was apparently inactive, i.e., Drug X had no significant effect on rats’ learning ability. These data would be negative in both senses – in contradicting the predicted positive effect and demonstrating a lack of effect of X. Despite the nuances, these are all examples of actionable data and should be reported. They represent valid and significant contributions to knowledge, and it is wrong to hide them into the file cabinet.
Results like those above relate to a noteworthy aspect of the world. Usually, they are part of a test of a substantive hypothesis, whether explicit or implicit, that offers a potential explanation for some observation. Consider, in the course of collecting new exploratory or descriptive data, researchers inevitably come across unexplained phenomena, which they then try to explain. Their putative explanations are hypotheses, and doing experiments is how we test hypotheses. We seek to obtain actionable data that are either consistent with, or contrary to, the hypothesis. The contrary (negative) data falsify the hypothesis. If your hypothesis predicts, e.g., that Drug X will improve rats’ maze-learning skills by strengthening synapses in their brains, then a prediction is that X will improve maze-learning ability. If Drug X does not affect maze-learning, this information falsifies the hypothesis, so it is critical to have it. The most important objective of science is to understand nature, and the formulation and testing of hypotheses is the major method that we use in striving for understanding. Keeping the overarching goal in mind fosters a richer appreciation for actionable negative data.
As I discuss elsewhere1, we scientists have a regrettable tendency to avoid stating our hypotheses, even when we have them and are clearly engaged in testing them (the unstated ones are implicit hypotheses). Besides making the scientific literature harder to follow, the habit of masking our intentions contributes to a reluctance to publish negative data. Without their important context, the significance of negative can easily be lost. In other words, by neglecting to acknowledge our hypotheses, we foster the disrespect in which the negative data pertaining to them is held by some authors and journals.
Logically, there is another class of actionable negative results, but I don’t know how large it is. To see what’s going on, it useful to view experimental results from a different perspective. On one hand, we have connected results, which are results that we know are more-or-less obviously linked to established scientific findings; we can see how these results will fit into the existing body of science knowledge. On the other hand, unconnected results lack any obvious linkage. You might collect unconnected results (what a collegue once referred to as doing “scientific stuff”) to pad your resume and demonstrate “productivity” or because you were curious about an outcome for “no good reason.”. Suppose you were given a newly synthesized chemical compound, Drug X, and though it had no known biological activity, tried giving it to your rats. After a thorough study, you conclude that X decreased the animals’ maze-learning ability, although you still have no clue as why. Your results do not obviously fit into the present body of scientific knowledge and their value is at best questionable (e.g., maybe Drug X damages brain cells). Nevertheless, a case can be made for publishing these positive results. After all, we’re not omniscient, and there are plenty of instances in which apparently unconnected results turned out to be highly significant.
It is much harder to justify publishing unconnected negative results. If there was no reason whatever to suspect that Drug X would affect the rats, and your thorough experimental study demonstrates that, in fact, it doesn’t, we might well suspect that no real purpose is served by publishing them. There are literally countless possibilities for generating negative unconnected, actually meaningless, results; no point in cluttering up the scientific record with them. If subsequent developments were to warrant, they could always be retrieved from the file drawer and published.
To summarize, the term “negative data” has been used to refer to distinctly different concepts. It either refers to results that are in opposition to positive data, or is a synonym for results that do not permit any conclusion whatsoever, e.g., they are incomplete, unintelligible, or uninterpretable. Conflating these two very different meanings creates needless confusion and causes some scientists to undervalue negative data.
Actionable negative data are often highly valuable, especially when they are obtained in the course of testing a hypothesis. If the negative results falsify a hypothesis, then they should be treated with the same respect as positive results are. Recognizing the pervasiveness of hypotheses – both explicit and implicit – and hypothesis-testing in science can help reduce the file drawer problem by rewarding investigators for publishing negative results. Perhaps we can also agree that a bunch of uninterpretable results should not count as “data” at all and that, in general, there is no real problem if they continue to rest in file drawers.
1 Alger, B.E., Chapter 2, “The Scientific Hypothesis Today,” in Defense of the Scientific Hypothesis: From Reproducibility Crisis to Big Data (New York: Oxford University Press; 2019), pp. 31-60.