Mammography interpretation difficulty affects more than number of errors

Twitter icon
Facebook icon
LinkedIn icon
e-mail icon
Google icon
 - Blue Breast Mammogram

It’s to be expected that more challenging mammography interpretations would decrease resident performance, but research has demonstrated that the types of errors made will vary based on whether a scan is classified as difficult by expert attending physicians or residents themselves.

Specifically, when residents self-assess an interpretation as difficult, lower performance is due to decreasing specificity, while scans rated as difficult by experts feature lower sensitivity rates, according to a study published in the July issue of Academic Radiology.

Knowing this distinction, educators should maximize the effectiveness of training by including a mix of self- and expert-assessed difficult cases, according to lead author Lars J. Grimm, MD, MHS, of Duke University Medical Center in Durham, N.C., and colleagues.

Grimm and colleagues have previously examined educational tools, showing that error patterns by radiology residents can be captured using statistical and machine-learning models. To further the development of educational materials and better understand how mammography difficulty affects resident performance, they conducted a study in which seven radiology residents and three expert breast imagers reviewed 100 mammograms. These consisted of bilateral medial-lateral oblique and caraniocaudal views, and each mammogram was self-assessed and expert-assessed as either high or low difficulty.

Results showed the impact on various aspects of performance is different for self- and expert-assessed difficulties. Cases self-assessed by residents had a sensitivity of 0.614 for low difficulty and a sensitivity of 0.707 for high difficulty, while resident specificities fell from 0.905 to 0.583 between the low and high difficulty cases.

“Regarding self-assessed difficulty, as residents transition from low to high difficulty cases, their performance decreases as they maintain their sensitivity at the cost of specificity,” wrote Grimm and colleagues. “Residents are likely concerned about missing cancers for the cases that they themselves assessed as difficult and therefore their internal threshold for annotating a case is decreased. This results in a larger number of false-positive annotations with similar sensitivity.”

A different pattern is seen when analyzing the expert-assessed cases. Here, resident sensitivities ranged from 0.558 to 0.796 between low and high difficulty, while specificities were relatively stable at 0.740 and 0.714 for low and high difficulty, respectively.

“This increase in false-negative interpretations is likely because of subtle masses that the experts were able to identify with difficulty but that were missed by the residents,” suggested the authors. “If the resident did not notice the mass, they might not have considered the cases difficult and as a result did not adjust their internal assessment threshold to maintain sensitivity. This likely explains the dramatic decrease in sensitivity while keeping the specificity comparable.”

Grimm and colleagues wrote that if training materials are chosen based only on expert assessment of difficulty, these materials will be biased toward cases where residents would exhibit low sensitivity, and teaching efforts could shift toward focusing on limiting missed abnormalities. Including scans that residents themselves rate as difficult would add focus on decreasing the number of false-positives and improve resident confidence by training on cases self-perceived as difficult.