Radiology: Improving $1.6B mammo false-positive rate complex
The somewhat surprising results led researchers to suggest that policymakers must consider the interplay of various factors if volume requirements are revised in an attempt to reduce false positives.
The U.S. has higher false-positive rates and similar cancer detection rates (CDRs) compared with other countries with established screening programs. Previous data, while inconsistent, have indicated that radiologists who read more studies have lower false-positive rates with similar sensitivities, and the Institute of Medicine has issued a call for studies to assess the relation between volume and performance.
In this study, researchers reviewed interpretive volume measures for 120 radiologists in the Breast Cancer Surveillance Consortium (BCSC) from 2002 to 2006 to evaluate the impact of interpretive volume on screening mammography performance.
The researchers, led by Diana S. M. Buist, PhD, MPH, from the Group Health Cooperative in Seattle, gathered total, screening and diagnostic volume for each radiologist for each year, assessed “screening focus” and linked volume measures for each year to screening performance for the following year.
They also completed a simulation to evaluate the impact of increasing the minimum volume, basing their results on 34 million women aged 40 to 79 years undergoing screening.
Buist et al reported a mean sensitivity of 85.2 percent, false-positive rate of 9.1 percent and CDR of 4.2 per 1,000 mammograms. “Radiologists with lower screening volumes had higher false-positive rates. … Interpreters with the highest diagnostic volume had higher false-positive rates. The lowest false-positive rates were among radiologists with a screening focus of 90 percent or greater (5.6 percent), and this group had the lowest CDR (3.4 per 1,000),” according to the researchers. Meanwhile, radiologists with a screening focus of less than 80 percent were linked with a false-positive rate of 10.7 percent and 4.8 per 1,000 studies CDR.
Among the entire cohort, 22.3 women were recalled for each cancer detected. Radiologists with a screening focus (percentage of total mammograms that were screening studies) of 90 percent or more had the best performance with a mean recall of 14.5, but they had lower sensitivity than their colleagues with less screening focus.
“Radiologists with a screening focus of less than 80 percent had higher sensitivity but recalled 23.8 to 28 women per cancer detected,” wrote Buist.
False-positive results were best among radiologists who read between 1,500 and 4,000 mammograms annually. Breast imagers with the lowest diagnostic volume had lower false-positive rates, the researchers said.
Current FDA regulations require radiologists read 960 mammograms within the previous 24 months. The researchers’ simulation showed that increasing the annual minimum volume could curb diagnostic work-ups with a very small reduction in cancer detection. Upping the annual requirement to a minimum of 1,000 studies would result in 43,629 fewer recalls while missing 40 cancers and detecting 143,215 cancers, reported Buist and colleagues. A baseline requirement of 1,500 mammograms would cut recalls by 92,838 while missing 761 cancers and detecting 142,494 cancers, added the researchers.
With false-positive works-up costs of approximately $1.6 billion annually in the U.S., raising total volume requirements to 1,000 or 1,500 mammograms annually would save $21.8 million or $46.4 million, respectively, estimated the researchers
If policymakers instead focused on annual screening volume and enacted a minimum of 1,000 screening studies, 71,110 fewer recalls would be issued at the expense of 415 cancers while detecting 141,413 cancers and reducing work-up costs by $35.6 million, continued Buist et al. Raising the screening threshold to 1,500 would cut recalls by 117,187, miss 361 cancers and detect 141,467 cancers while cutting costs by $58.6 million.
The findings revealed the complexity of predicting and improving diagnostic performance. “Contrary to our expectations, we observed no clear associations between volume and sensitivity,” wrote Buist.
Plus, the researchers observed wide variations in performance among radiologists within volume levels, leading them to hypothesize that “screening performance is unlikely to be affected by volume alone, but rather by a balance in interpreted exam composition. The techniques and skills required for interpreting different images differ and will also generate different performance measures—for example, recognizing normal and benign variants is important for false-positive rates, whereas detecting subtle cancer findings is important for sensitivity.”
Buist and colleagues also pointed out that they could not determine cause and effect related to the differences for radiologists with greater diagnostic volumes.
The researchers concluded that “changing MQSA [Mammography Quality Standards Act] volume requirements or adding minimum numbers of screening or diagnostic examinations could result in modest improvements in some screening outcomes at a cost to others.” However, they emphasized that the change could propel lower volume radiologists to cease reading mammograms, causing a workforce issue.
Ultimately, Buist and colleagues recommended additional studies to further evaluate the interrelationships between training, experience, volume and performance measures and suggested that policy makers may need to adjust several metrics simultaneously.