In the identification of minimally acceptable interpretive performance criteria for screening mammography, interpreting physicians whose performance falls outside the recognized cut points should be reviewed in the context of their specific practice settings and be considered for additional training, based on a study published in the May edition of Radiology.
Patricia A. Carney, PhD, of the department of family medicine and department of public health and preventive medicine at Oregon Health & Science University in Portland, Ore., and colleagues sought to develop criteria to identify thresholds for minimally acceptable physician performance in interpreting screening mammography studies, as well as to examine the impact that the implementation of these criteria may have on radiology practices in the U.S.
The HIPAA-compliant study utilized an Angoff approach in two phases, in order to set criteria for identifying minimally acceptable interpretive performance at screening mammography as measured by sensitivity, specificity, recall rates, positive predictive value of recall and positive predictive value of biopsy recommendations and cancer detection rates, with each performance measure being considered separately, explained the authors.
In the first phase of the study, 10 expert radiologists considered a hypothetical pool of 100 interpreting physicians and each conveyed their cut-off points of minimally acceptable performance. During each round of scoring, all expert radiologists’ cut points were summarized into a mean, median, mode and range which were presented back to the group. From the beginning, the experts were told that any physician performance falling outside the cut points would result in a recommendation to consider additional training.
During the second phase, the average of the data was presented to the experts in order to demonstrate the potential impact cut points would have on radiology practice. After review of the data, rescoring was done until consensus among experts was achieved. Lastly, simulation methods were used to estimate the potential impact of performance that improved to acceptable levels if effective additional training was provided.
The experts determined the final cut points of low performance as presenting a sensitivity rate of less than 75 percent; specificity of less than 88 or greater than 95 percent; a recall rate less than 5 or greater than 12 percent; positive predictive value of recall of less than 3 or greater than 8 percent; positive predictive value of biopsy recommendation of less than 20 or greater than 40 percent; and a cancer detection rate of less than 2.5 per 1,000 interpretations.
According to the authors, the performance measure cut points could result in 18–28 percent of interpreting physicians being considered for additional training on the basis of sensitivity and cancer detection rate, while the cut points for specificity, recall, positive predictive value of recall and positive predictive value of biopsy recommendation would likely affect 34–49 percent of all practicing interpreters.
In addition, Carney and colleagues said: “If underperforming physicians moved into the acceptable range, detection of an additional 14 cancers per 100,000 women screened and a reduction in the number of false-positive examinations by 880 per 100,000 women screened would be expected.”
The authors noted limitations to their study, and suggested that had the experts had been instructed to expect a different outcome (the suspension or restriction of practice), the chosen cut points may have been different.
“The range of performance that occurs in actual practice helped the experts reach consensus on cut points for minimally acceptable performance when the outcome would be to receive a recommendation to consider additional training,” they concluded.