Scientific data reproducibility – Bad data leads to false hopes and delays finding cures
June 24, 2015So much has and should be said about the reproducibility – or lack thereof – of scientific data. Fundamentally, the inability to reproduce published data means – at best – that time and money are wasted. At worst, it leads to years of pointless investigation, a huge amount of money lost, diminished respect in science, delays in new treatments or cures, and increased skepticism about biomedical research. Patients and their families want to get better, to be healthy, to live. Period.
While no one really knows or can quantify how much data is not reproduced or how much money is lost due to irreproducible (or bad) data, estimates range from 25 – 89% of all basic research cannot be replicated. Part of this wide range is due to the definition of irreproducible.
What is lack of reproducibility? Some are conservative and define it as papers that don’t include enough information in the materials or results section to allow another scientist to be able to repeat the experiment without any additional information. This means that the source of all reagents, the concentration, each temperature, time, and step in each experiment must be fully disclosed. It also means that each result must be reproduced with the same result consistently. This is what we are taught in middle school when we are first exposed to biology and the scientific method. Less conservative definitions include faulty methods, sloppy studies, lack of the proper or all of the necessary controls, poor comparisons of groups, inappropriate or useless statistics, misinterpretation of the data, or even false data.
Falsified or fraudulent data does not contribute significantly to this problem.
A recent paper by Leonard Freedman, Iain Cockburn, and Timothy Simcoe in PLOS Biology discusses the economic impact of this problem. They estimate that 50% of all basic science publications cannot be reproduced. This wastes $28 Billion in the US annually. According to their paper, because basic science is largely funded by government agencies, most of the funding comes from tax payers. In the past several years, NIH has focused on this problem and has been developing ideas and methods to reduce this problem. (For more info, click here or here)
So, what can be done about it?
Among the suggestions –
Better training for students. This will increase the awareness of the problem and will train better reviewers (since grant proposals and papers are all peer-reviewed)
Guidelines and recommendations. These would outline what should be done.
Checklists for reviewers from journals and grant agencies. This will require the inclusion of certain items such as reagent identification and concentration. Papers would not be published if the checklist was not complete.
Verification of cell lines. Cell lines are used ubiquitously in basic research. They are not routinely checked to confirm their identity or purity (may have mycoplasma infections). Many believe that cell lines in labs are all contaminated.
The most public example of this is a cell line called MDA-MB-435. This cell line was initially established from a breast cancer tumor in the 1970s. Shortly after it was established, it was contaminated by a melanoma cell line that overtook the colony and displaced the breast cancer cells. It took 20+ years to confirm the true identity of these MDA-MB-435 cells as a melanoma line – and in the meantime hundreds of papers in breast cancer were published that included experiments conducted in this “breast cancer” cell line. Some additional info can be found here, here, and here.
Verifying cell lines can be costly since it ideally should be done prior to and upon completion of a set of experiments. Many claim that this would increase the expense of experiments to a staggering level. If $28B is wasted per year, and improving the quality of data by verification of cell lines would help alleviate some of that waste, isn’t it worth the investment? In the near long-term, would it not pay off?
Maybe one solution is that institutions offer this service for researchers so that grant dollars aren’t used to for this service. Maybe there is an infrastructure grant or funding available to an institution rather than a lab that would cover the cost of verifying cell lines. This would increase the accuracy of the data – and the institution itself would improve their reputation and standards by ensuring good results.
There is much work to be done to tackle this problem. It is uncomfortable to talk about – but necessary. People depend on it.
What do you think? Any other ideas on how to improve the situation?