How Often Do Scientists Commit Fraud?
The evidence says scientists are pretty honest. New techniques could make it easier for scientific fabricators to be caught
Gallup’s annual poll of which professions are the most trustworthy doesn’t ask about scientists, but it’s safe to say that at the very least they’d rank far higher than the used car salespeople and members of Congress at the bottom.
At the same time, among the thousands of people globally who practice science and publish their results, some minority likely yield to the temptation to massage data to achieve attention-getting (and funding-friendly) results. In recent years, it has become politically useful for some to seize upon this possibility and allege deliberate scientific fraud. (Charges that man-made climate change is a widespread scientific conspiracy have only become more common since the so-called Climategate scandal of 2009, despite several investigations that have failed to find any evidence of fraud or scientific misconduct.)
But how often do scientists actually lie about their data? In other words, how much should we trust in them?
The answer, at least according to a study published today in the Proceedings of the National Academy of Sciences, is that on the whole, scientists are a pretty honest group. In the paper, medical researchers from the University of Washington and elsewhere found that of the more than 25 million biomedical research-related articles published in the National Institutes of Health PubMed database that date back to the 1940s, 2,047 were retracted at some point since their publication. That’s less than 0.01 percent of all the papers in the database.
The researchers broke their results down further, attempting to attribute each retraction to a type of cause. By their accounting, 21.3 percent were due to honest error, such as unintentional misinterpretation of data. Meanwhile, 67.4 percent of the retractions could be attributed to some sort of misconduct, including fraud or fabrication (43.4 percent), plagiarism (9.8 percent) and duplicate publication (14.2 percent). When compared with the articles retracted before 1975, those retracted afterward were ten times more likely to be fraudulent, as opposed to an honest mistake.
The overall modest rate of fraud could explain why the authors of the blog Retraction Watch, which documents retracted papers, have encountered opposition. Some say that directing attention towards isolated cases of dishonesty disproportionately increases public mistrust in science as a whole. “The argument goes something like this,” they wrote in May in Lab Times. “Scientific fraud is rare, so focusing on misconduct gives a distorted picture of research that will only give ammunition to critics, who want to cast doubt on subjects such as climate change and vaccine safety.”
One response might be that we don’t actually know how rare fraud is, despite the 0.01 percent retraction figure out this new PNAS study. As the study’s authors note, in many cases an article might be suspect but a journal doesn’t have enough proof to actually retract it. In 2005, for example, The Lancet “expressed concern” about the results of a study that found a correlation between a Mediterranean diet and a reduced risk of heart disease, but they didn’t ultimately retract the paper.
Moreover, we have no way of knowing how many suspect data sets never even come to light. A fabricated data set might not prove replicable by other researchers, but in many cases, it’s doubtful this would prompt them to allege dishonesty. Historically, many cases of scientific fraud are exposed only by internal whistle-blowers.
Recent events, though, indicate that we might be entering an age in which scientific discoveries actually help us detect fraud, or at least some types of it. This past July, social psychologist Uri Simonsohn of the University of Pennsylvania garnered headlines by using an innovative statistical analysis to detect fabricated data in the work of social psychologist Dirk Smeesters, who had written a paper finding a positive effect for color on consumer behavior.
Simonsohn’s technique is complex but relies upon the fact that people are notoriously bad at faking sets of data characterized by the same sort of randomness that occurs in real events. Simonsohn told Nature that “The basic idea is to see if the data are too close to the theoretical prediction, or if multiple estimates are too similar to each other.”
Soon after Smeesters’ resignation, Simonsohn made his algorithm public, encouraging researchers to publish their raw data and for others to put it to the test. He hopes that the real possibility that any researchers tempted to manipulate their data could be caught will act as a powerful deterrent. This, in theory, would not only decrease the amount of fraud but it would also increase the trust we can put in the products of science as a whole.