Probability and medical research: How P value reporting has changed over time

Twitter icon
Facebook icon
LinkedIn icon
e-mail icon
Google icon
 - multiple charts

Over the past 25 years, abstracts and full-text articles in the medical literature have done a better job of including the P value for specific findings. However, this doesn't mean there aren't still some common statistical oversights, according to a study published online March 15 in  JAMA.

The P value  has been used to convey inference about the statistical significance of study results since its introduction to the medical research lexicon in 1925 by R.A. Fisher’s  Statistical Methods for Research Workers. Fisher defined the P value as “the probability of the observed result, plus more extreme results, if the null hypothesis were true.”

But there is an increasing concern that P values reported in biomedical studies today are frequently misused and misunderstood, said lead author David Chavalarias, PhD, of the Complex Systems Institute in Paris, and his colleagues.

“There is mounting evidence from diverse fields that reporting biases tend to preferentially select the publication and highlighting of results that are statistically significant, as opposed to ‘negative’ results,” they wrote. "Such biases could have major implications for the reliability of the published scientific literature.”

Chavalarias and his team set out to evaluate how P values have been reported over the past 25 years in abstracts and full texts of biomedical research articles as well as the inclusion frequency of alternative statistical values. To do so, they performed a text-mining analysis on more than 13 million published abstracts and articles from MEDLINE and PubMed Central (PMC) between 1990 and 2015 to collect data on the presence of P values.

They found that the reporting of P values in published abstracts more than doubled, from 7 percent in 1990 to nearly 16 percent by 2014. Nearly all full-texts articles containing P values (96 percent) reported statistically significant results in their respective studies, with very few listing alternative statistical values.

“Abstracts and articles reported P values over time, almost all abstracts and articles with P values reported statistically significant results, and, in a subgroup analysis, few articles included confidence intervals, Bayes factors, or effect sizes,” the researchers concluded. “Rather than reporting isolated P values, articles should include effect sizes and uncertainty metrics.”